Select Database Rows in Pseudo Random Order

November 28, 2010

Displaying a random selection of items from a large set is common in many web applications. For example, a site may show a random selection of 10 tweets from a database of 10 million or Wikipedia may display a list of 5 random articles from their collection of 20 million. There are multiple ways to accomplish this, but I will walk through one relatively efficient way to solve this problem in a Ruby on Rails.

Consider a web application which displays tweets collected over some period of time. A user should be able to browse these tweets, 10 at a time, in some random order.

With MySQL, placing the Rand() in the order by statement returns rows in a random order, however that is extremely inefficient in applications with large data sets and lots of users.

A more time efficient solution would be to add a field to each row that contains a random number. When selecting tweets from the database, first generate a random number to use as an index into the random column. Then, select tweets where the random index greater is than the chosen random number and ordering by this random index. This select statement efficiently returns tweets in a random order.

Below, I show the steps to creating a very simple Ruby on Rails application that exhibits this technique. A working demonstration of this can be found at in my GitHub repository: drenz/RandomRow

First create a new RoR project:

$ rails new RandomRow

Configure the project to use MySQL. (This is outside the scope of this article)

Generate a Tweet model and the associated controller and views:

$ rails g scaffold Tweet tweet_id:decimal body:string user_id:integer user_screen_name:string profile_image_url:string randomize:integer

Modify the migration file to add an index for randomize :

class CreateTweets < ActiveRecord::Migration   def self.up     create_table :tweets, :id => false do |t|
      t.decimal :tweet_id, :scale => 0, :null => false, :precision => 20, :primary_key => true
      t.string :body
      t.integer :user_id
      t.string :user_screen_name
      t.string :profile_image_url
      t.integer :randomize, :default => 0

      t.timestamps
    end
    add_index :tweets, :randomize
  end

  def self.down
    remove_index :tweets, :randomize
    drop_table :tweets
  end
end

Modify the Tweet model to use tweet_id as the primary key:

class Tweet < ActiveRecord::Base
    self.primary_key = :tweet_id
end

Create the database:

$ rake db:setup RAILS_ENV="development"

At this point, you could populate this skeleton application with tweets by inserting them directly into the database. (See the dev.twitter.com for more info).

I’ve included an sql file (db/tweets.sql) containing sample tweets to test with. Load them with the following command:

$ mysql -u root -p -D RandomRow_development < db/tweets.sql

Modify the tweets_controller index action:

num_to_display = 10

# do this loop until you get random number that will display
# num_to_display tweets
begin
  rand_number = rand(1000000) + 1

  @tweets = Tweet.order('randomize').limit(10).where('randomize >= '+rand_number.to_s)

end while @tweets.size < num_to_display

Modify the view to add a pagination link above the “New Tweet” link:

<%= link_to 'More >>', tweets_path %>

Now when a user clicks “More,” 10 randomly selected tweets are displayed.

When operationalizing this technique, there are a few things must also be done:

  • On insert, generate a random number within the correct range and insert it with the row.
  • Periodically, update the random field on each row with a new random number. This will make sure that users don’t start to see patterns in your randomness.

Leave a Comment

Previous post:

Next post: