Ruby optimization example and explanation


Recently I wrote a small DSL that allows the user to define some code that then gets executed later on and in different contexts. Imagine something like Sinatra where each route action is defined in a block and then executed in context of an incoming request.

The challenge is that blocks come with their context and you can’t execute a block in the context of another one.

Here is a reduction of the challenge I was trying to solve:

class SolutionZero
  def initialize(origin, &block)
    @origin = origin
    @block = block
  end
 
  def dispatch
    @block.call
  end
end
 
SolutionZero.new(42){ @origin + 1 }.dispatch
# undefined method `+' for nil:NilClass (NoMethodError)

The problem is that the block refers to the @origin instance variable which is not available in its context.
My first workaround was to use instance_eval:

class SolutionOne
  def initialize(origin, &block)
    @origin = origin
    @block = block
  end
 
  def dispatch
    self.instance_eval &@block
  end
end
 
SolutionOne.new(40){ @origin + 2}.dispatch
# 42

My workaround worked fine, since the block was evaluated in the context of the instance and therefore the @origin ivar is made available to block context. Technically, I was good to go, but I wasn’t really pleased with this solution. First using instance_eval often an indication that you are trying to take a shortcut. Then having to convert my block stored as a block back into a proc every single dispatch makes me sad. Finally, I think that this code is probably not performing as well as it could, mainly due to unnecessary object allocations and code evaluation.
I did some benchmarks replacing instance_eval by instance_exec since looking at the C code, instance_exec should be slightly faster. Turns out, it is not so I probably missed something when reading the implementation code.

I wrote some more benchmarks and profiled a loop of 2 million dispatches (only the #disptach method call on the same object). The GC profiler report showed that the GC was invoked 287 times and each invocation was blocking the execution for about 0.15ms.
Using Ruby’s ObjectSpace and disabling the GC during the benchmark, I could see that each loop allocates an object of type T_NODE which is more than likely our @block ivar converted back into a block. This is quite a waste. Furthermore, having to evaluate our block in a different context every single call surely isn’t good for performance.

So instead of doing the work at run time, why not doing it at load time? By that I mean that we can optimize the #dispatch method if we could “precompile” the method body instead of “proxying” the dispatch to an instance_eval call. Here is the code:

class SolutionTwo
  def initialize(origin, &block)
    @origin = origin
    implementation(block)
  end
 
  private
 
  def implementation(block)
    mod = Module.new
    mod.send(:define_method, :dispatch, block)
    self.extend mod
  end
end
 
SolutionTwo.new(40){ @origin + 2}.dispatch
# 42

This optimization is based on the fact that the benchmark (and the real life usage) creates the instance once and then calls #dispatch many times. So by making the initialization of our instance a bit slower, we can drastically improve the performance of the method call. We also still need to execute our block in the right context. And finally, each instance might have a different way to dispatch since it is defined dynamically at initialization. To work around all these issues, we create a new module on which we define a new method called dispatch and the body of this method is the passed block. Then we simply our instance using our new module.

Now every time we call #dispatch, a real method is dispatched which is much faster than doing an eval and no objects are allocated. Running the profiler and the benchmarks script used earlier, we can confirm that the GC doesn’t run a single time and that the optimized code runs 2X faster!

 

Once again, it’s yet another example showing that you should care about object allocation when dealing with code in the critical path. It also shows how to work around the block bindings. Now, it doesn’t mean that you have to obsess about object allocation and performance, even if my last implementation is 2X faster than the previous, we are only talking about a few microseconds per dispatch. That said microseconds do add up and creating too many objects will slow down even your faster code since the GC will stop-the-world as its cleaning up your memory. In real life, you probably don’t have to worry too much about low level details like that, unless you are working on a framework or sharing your code with others. But at least you can learn and understand why one approach is faster than the other, it might not be useful to you right away, but if you take programming as a craft, it’s good to understand how things work under the hood so you can make educated decisions.
 

Update:

@apeiros in the comments suggested a solution that works & performs the same as my solution, but is much cleaner:

class SolutionTwo
  def initialize(origin, &block)
    @origin = origin
    define_singleton_method(:dispatch, block) if block_given?
  end
end

Similar Posts

, ,

  1. #1 by James Harton - September 5th, 2011 at 15:06

    Thanks Matt, I was doing something similar just the other day, and it’s nice to have someone quantify my gut feelings with actual data. This stuff is especially important if you’re doing a lot of work with libraries like eventmachine where blocks are passed all over the place as callbacks.

  2. #2 by apeiros - September 5th, 2011 at 22:26

    Thanks for the post, it was an interesting read.
    I prefer to just do `@block = block`, since the `if block_given?` is superfluous. Or is there a reason to have the `if` around I’m unaware of? (block will be nil if no block is being passed).

    Also, what about:

    def implementation(block)
    define_singleton_method :dispatch, block
    end

    (Personally I’d even use &block there for #implementation, even if that means that the block is converted back and forth – after all, it’s a +1, not a +N).

    • #3 by Matt Aimonetti - September 6th, 2011 at 08:15

      I update my blog post with a modified version of your suggestion. I agree that define_singleton_method is a cleaner & more expressive solution. The block_given? guard is to prevent passing nil to the meta definer method which would then crap out. I don’t really see the point to store the block in an ivar tho.

  3. #4 by Peter Cooper - September 6th, 2011 at 04:41

    Not changing your implementation, but you could also do this:

    def implementation(block)
      extend Module.new { define_method :dispatch, block }
    end

    However, I do prefer apeiros’ approach here ;-)

  4. #5 by Gabriel Sobrinho - September 6th, 2011 at 04:56

    Also:

    def implementation(block)
    define_method :dispatch, block
    end

    I don’t know the performance hit here but it is less verbose.

    What yout think?

    • #6 by Matt Aimonetti - September 6th, 2011 at 08:22

      You can’t call define_method on an instance so that wouldn’t work.

  5. #7 by r4ito - September 6th, 2011 at 12:05

    What about:

    self.class.send(:define_method, :dispatch, block)

    • #8 by Matt Aimonetti - September 6th, 2011 at 12:41

      that would be a bad idea as the new instance would overwrite the method previous defined. Here is an example:

      class RaitoSolution
        def initialize(origin, &block)
          @origin = origin
          self.class.send(:define_method, :dispatch, block)
        end
      end
       
      a = RaitoSolution.new(2){ @origin + 40}
      p a.dispatch # => 42
      b = RaitoSolution.new(40){ @origin + 2}
      p b.dispatch # => 42
      p a.dispatch # => 4
      • #9 by r4ito - September 6th, 2011 at 13:02

        Thanks! That example helped me a lot.

  6. #10 by Roger Leite - September 7th, 2011 at 04:46

    Great post! Thanks for sharing this!

  7. #11 by Jared - September 8th, 2011 at 15:53

    Nice post! Thanks!

    To me, it looks like the reason instance_eval is faster is because instance_exec allocates an array for potential arguments (regardless of whether they’re provided), likely triggering the GC a few times in your profiling.

    https://github.com/ruby/ruby/blob/trunk/vm_eval.c#L1365 vs https://github.com/ruby/ruby/blob/trunk/vm_eval.c#L1267

    • #12 by Matt Aimonetti - September 9th, 2011 at 07:11

      Looks like you found the reason why one is slower than the other. Thx

  1. No trackbacks yet.

Comments are closed.