Using larger and larger datasets, ActiveRecord has performance problems when trying to iterate over the data. Consider this simple example to iterate over a table of genes, and print out the name of each. This is an common use case for accessing database rows.
Gene.all.each {|gene| puts gene.name}
Unfortunately, if the gene table contains millions of rows, ActiveRecord will (try) to pull all of the rows into memory first, and then iterate over them as array. Pulling so much data into memory will make the process take a very long time. The solution to this problem is to pull smaller chunks of the data in at a time, and then iterate over each of these chunks in sequence.
This is the type of solution provided by paginating_find, which uses the SQL commands of LIMIT and OFFSET to pull smaller chunks of the table into memory. The advantage of using LIMIT and OFFSET is that they are (I believe) database agnostic and so will work across any database.
As discussed in the comments of this post on Jamis Buck’s blog, using the OFFSET command requires the DB engine to linearly search through the records to find the correct point at which the chunk of returned data begins. Therefore using OFFSET you may expect that the time taken to return the data increases proportionally with the size of the dataset. A (possibly MySQL specific) solution described by both Jamis Buck and Michael Schuerig relies can split the dataset into smaller chunks based on the primary key. Since the primary key is indexed, the time taken by the DB to find the correct place to start the next chunk of rows should be much faster.
Update
Michael Schuerig has also pointed out that his plugin accepts ActiveRecord syntax, so that table joins and conditions can be given, and then the returned data iterated over in smaller chunks.
SQL queries
These two above solutions describe iterating over single tables over data using ActiveRecord syntax, but what if you want to feed ActiveRecord a complex SQL query then iterate over the results in chunks? This was the question I asked the UK North West Ruby Users Group and was given a neat solution that relies on using the ActiveRecord connect method. If you use ActiveRecord::Base.connection.execute(statement) the data created is not returned in bulk, but can instead can be pulled from the database one row at a time. The only drawback is that the method to find the headers of the returned data is database adaptor specific, in the instance of MySQL, the example method is fetch_fields, called on the returned data object.
Native Drivers
I think it’s worth pointing out in a discussion about Ruby and database access, that if you install the native driver gem for a given database this can result in performance increase. For example if you’re using MySQL, this would be
sudo gem install mysql
I believe there are similar implementations for other database adaptors, which a Google search should find.