An analysis of memory bloat in Active Record 5.2

Current patterns in Active Record lead to enormous amounts of resource usage. Here is an analysis of Rails 5.2

One of the very noble goals the Ruby community which is being spearheaded by Matz is the Ruby 3x3 plan. The idea is that using large amounts of modern optimizations we can make Ruby the interpreter 3 times faster. It is an ambitious goal, which is notable and inspiring. This “movement” has triggered quite a lot of interesting experiments in Ruby core, including a just-in-time compiler and action around reducing memory bloat out-of-the-box. If Ruby gets faster and uses less memory, then everyone gets free performance, which is exactly what we all want.

A big problem though is that there is only so much magic a faster Ruby can achieve. A faster Ruby is not going to magically fix a “bubble sort” hiding deep in your code. Active Record has tons of internal waste that ought to be addressed which could lead to the vast majority of Ruby applications in the wild getting a lot faster. Rails is the largest consumer of Ruby after all and Rails is underpinned by Active Record.

Sadly, Active Record performance has not gotten much better since the days of Rails 2, in fact in quite a few cases it got slower or a lot slower.

Active Record is very wasteful

I would like to start off with a tiny example:

Say I have a typical 30 column table containing Topics.

If I run the following, how much will Active Record allocate?

a = []
Topic.limit(1000).each do |u|
   a << u.id
end

Total allocated: 3835288 bytes (26259 objects)

Compare this to an equally inefficient “raw version”.

sql = -"select * from topics limit 1000"
ActiveRecord::Base.connection.raw_connection.async_exec(sql).column_values(0)

Total allocated: 8200 bytes (4 objects)

This amount of waste is staggering, it translates to deadly combo:

Extreme levels of memory usage

and

Slower performance

But … that is really bad Active Record!

An immediate gut reaction here is that I am “cheating” and writing “slow” Active Record code, and comparing it to mega optimized raw code.

One could argue that I should write:

a = []
Topic.select(:id).limit(1000).each do |u|
  a << u.id
end

In which you would get:

Total allocated: 1109357 bytes (11097 objects)

Or better still:

Topic.limit(1000).pluck(:id)

In which I would get

Total allocated: 221493 bytes (5098 objects)

Time for a quick recap.

The “raw” version allocated 4 objects, it was able to return 1000 Integers directly which are not allocated indevidually in the Ruby heaps and are not subject to garbage collection slots.
The “naive” Active Record version allocates 26259 objects
The “slightly optimised” Active Record version allocates 11097 objects
The “very optimised” Active Record version allocates 5098 objects

All of those numbers are orders of magnitude larger than 4.

How many objects does a “naive/lazy” implementation need to allocate?

One feature that Active Record touts as a huge advantage over Sequel is the “built-in” laziness.

ActiveRecord will not bother “casting” a column to a date till you try to use it, so if for any reason you over select ActiveRecord has your back. This deficiency in Sequel is acknowledged and deliberate:

This particular niggle makes it incredibly hard to move to Sequel from ActiveRecord without extremely careful review, despite Sequel being so incredibly fast and efficient.

We have no “fastest” example out there of an efficient lazy selector. In our case we are consuming 1000 ids so we would expect the mega efficient implementation to allocate 1020 or so objects cause we can not get away without allocating a Topic object. We do not expect 26 thousand.

Here is a quick attempt at such an implementation: (note this is just proof of concept of the idea, not a production level system)

$conn = ActiveRecord::Base.connection.raw_connection

class FastBase

  class Relation
    include Enumerable

    def initialize(table)
      @table = table
    end

    def limit(limit)
      @limit = limit
      self
    end

    def to_sql
      sql = +"SELECT #{@table.columns.join(',')} from #{@table.get_table_name}"
      if @limit
        sql << -" LIMIT #{@limit}"
      end
      sql
    end

    def each
      @results = $conn.async_exec(to_sql)
      i = 0
      while i < @results.cmd_tuples
        row = @table.new
        row.attach(@results, i)
        yield row
        i += 1
      end
    end

  end

  def self.columns
    @columns
  end

  def attach(recordset, row_number)
    @recordset = recordset
    @row_number = row_number
  end

  def self.get_table_name
    @table_name
  end

  def self.table_name(val)
    @table_name = val
    load_columns
  end

  def self.load_columns
    @columns = $conn.async_exec(<<~SQL).column_values(0)
      SELECT COLUMN_NAME FROM information_schema.columns
      WHERE table_schema = 'public' AND
        table_name = '#{@table_name}'
    SQL

    @columns.each_with_index do |name, idx|
      class_eval <<~RUBY
        def #{name}
          if @recordset && !@loaded_#{name}
            @loaded_#{name} = true
            @#{name} = @recordset.getvalue(@row_number, #{idx})
          end
          @#{name}
        end

        def #{name}=(val)
          @loaded_#{name} = true
          @#{name} = val
        end
      RUBY
    end
  end

  def self.limit(number)
    Relation.new(self).limit(number)
  end
end

class Topic2 < FastBase
  table_name :topics
end

Then we can measure:

a = []
Topic2.limit(1000).each do |t|
   a << t.id
end
a

Total allocated: 84320 bytes (1012 objects)

So … we can manage a similar API with 1012 object allocations as opposed to 26 thousand objects.

Does this matter?

A quick benchmark shows us:

Calculating -------------------------------------
               magic    256.149  (± 2.3%) i/s -      1.300k in   5.078356s
                  ar     75.219  (± 2.7%) i/s -    378.000  in   5.030557s
           ar_select    196.601  (± 3.1%) i/s -    988.000  in   5.030515s
            ar_pluck      1.407k (± 4.5%) i/s -      7.050k in   5.020227s
                 raw      3.275k (± 6.2%) i/s -     16.450k in   5.043383s
             raw_all    284.419  (± 3.5%) i/s -      1.421k in   5.002106s

Our new implementation (that I call magic) does 256 iterations a second compared to Rails 75. It is a considerable improvement over the Rails implementation on multiple counts. It is both much faster and allocates significantly less memory leading to reduced process memory usage. This is despite following the non-ideal practice of over selection. In fact our implementation is so fast, it even beats Rails when it is careful only to select 1 column!

This is the Rails 3x3 we could have today with no changes to Ruby! :confetti_ball:

Another interesting data point is how much slower pluck, the turbo boosted version Rails has to offer, is slower that raw SQL. In fact, at Discourse, we monkey patch pluck exactly for this reason. (I also have a Rails 5.2 version)

Why is this bloat happening?

Looking at memory profiles I can see multiple reasons all this bloat happens:

Rails is only sort-of-lazy… I can see 1000s of string allocations for columns we never look at. It is not “lazy-allocating” it is partial “lazy-casting”
Every row allocates 3 additional objects for bookeeping and magic. ActiveModel::Attribute::FromDatabase, ActiveModel::AttributeSet, ActiveModel::LazyAttributeHash . None of this is required and instead a single array could be passed around that holds indexes to columns in the result set.
Rails insists on dispatching casts to helper objects even if the data retrieved is already in “the right format” (eg a number) this work generates extra bookkeeping
Every column name we have is allocated twice per query, this stuff could easily be cached and reused (if the query builder is aware of the column names it selected it does not need to ask the result set for them)

What should to be done?

I feel that we need to carefully review Active Record internals and consider an implementation that allocates significantly less objects per row. We also should start leveraging the PG gem’s native type casting to avoid pulling strings out of the database only to convert them back to numbers.

You can see the script I used for this evaluation over here:

gist.github.com

https://gist.github.com/SamSaffron/409805f6c8447d344e04ad68505ec43f

memory.rb

require 'bundler/inline'

gemfile do
  source 'https://rubygems.org'
  gem 'pg'
  gem 'activerecord', '5.2.0'
  gem 'memory_profiler'
  gem 'benchmark-ips'
end

This file has been truncated. show original