Ruby concurrency explained


Concurrency is certainly not a new problem but it’s getting more and more attention as machines start having more than 1 core, that web traffic increases drastically and that some new technologies show up saying that they are better because they handle concurrency better.
If that helps, think of concurrency as multitasking. When people say that they want concurrency, they say that they want their code to do multiple different things at the same time. When you are on your computer, you don’t expect to have to choose between browsing the web and listening to some music. You more than likely want to run both concurrently. It’s the same thing with your code, if you are running a webserver, you probably don’t want it to only process one request at a time.
The aim of this article is to explain as simply as possible the concept of concurrency in Ruby, the reason why it’s a complicated topic and finally the different solutions to achieve concurrency.

First off, if you are not really familiar with concurrency, take a minute to read the wikipedia article on the topic which is a great recap on the subject. But now, you should have noticed that my above example was more about parallel programming than concurrency, but we’ll come back to that in a minute.

The real question at the heart of the quest for concurrency is: “how to increase code throughput”.

We want our code to perform better, and we want it to do more in less time. Let’s take two simple and concrete examples to illustrate concurrency. First, let’s pretend you are writing a twitter client, you probably want to let the user scroll his/her tweets while the latest updates are  being fetched. In other words, you don’t want to block the main loop and interrupt the user interaction while your code is waiting for a response from the Twitter API. To do that, a common solution is to use multiple threads. Threads are basically processes that run in the same memory context. We would be using one thread for the main event loop and another thread to process the remote API request. Both threads share the same memory context so once the Twitter API thread is done fetching the data it can update the display. Thankfully, this is usually transparently handled by asynchronous APIs (provided by the OS or the programming language std lib) which avoid blocking the main thread.

The second example is a webserver. Let’s say you want to run a Rails application. Because you are awesome, you expect to see a lot of traffic. Probably more than 1 QPS (query/request per second). You benchmarked your application and you know that the average response time is approximately 100ms. Your Rails app can therefore handle 10QPS using a single process (you can do 10 queries at 100ms in a second).

But what happens if your application gets more than 10 requests per second? Well, it’s simple, the requests will be backed up and will take longer until some start timing out. This is why you want to improve your concurrency. There are different ways to do that, a lot of people feel really strong about these different solutions but they often forget to explain why they dislike one solution or prefer one over the other. You might have heard people conclusions which are often one of these: Rails can’t scale, you only get concurrency with JRuby, threads suck, the only way to concurrency is via threads, we should switch to Erlang/Node.js/Scala, use fibers and you will be fine, add more machines, forking > threading.  Depending on who said what and how often you heard it on twitter, conferences, blog posts, you might start believing what others are saying. But do you really understand why people are saying that and are you sure they are right?

The truth is that this is a complicated matter. The good news is that it’s not THAT complicated!

The thing to keep in mind is that the concurrency models are often defined by the programming language you use. In the case of Java, threading is the usual solution, if you want your Java app to be more concurrent, just run every single request in its own thread and you will be fine (kinda). In PHP, you simply don’t have threads, instead you will start a new process per request. Both have pros and cons, the advantage of the Java threaded approach is that the memory is shared between the threads so you are saving in memory (and startup time), each thread can easily talk to each other via the shared memory. The advantage of PHP is that you don’t have to worry about locks, deadlocks, threadsafe code and all that mess hidden behind threads. Described like that it looks pretty simple, but you might wonder why PHP doesn’t have threads and why Java developers don’t prefer starting multiple processes. The answer is probably related to the language design decisions. PHP is a language designed for the web and for short lived processes. PHP code should be fast to load and not use too much memory. Java code is slower to boot and to warm up, it usually uses quite a lot of memory. Finally, Java is a general purpose programming language not designed primarily for the internet. Others programming languages like Erlang and Scala use a third approach: the actor model. The actor model is somewhat a bit of a mix of both solutions, the difference is that actors are a like threads which don’t share the same memory context. Communication between actors is done via exchanged messages ensuring that each actor handles its own state and therefore avoiding corrupt data (two threads can modify the same data at the same time, but an actor can’t receive two messages at the exact same time). We’ll talk about that design pattern later on, so don’t worry if you are confused.

What about Ruby? Should Ruby developers use threads, multiple processes, actors, something else? The answer is: yes!

Threads

Since version 1.9, Ruby has native threads (before that green threads were used). So in theory, if we would like to, we should be able to use threads everywhere like most Java developers do. Well, that’s almost true, the problem is that Ruby, like Python uses a Global Interpreter Lock (aka GIL). This GIL is a locking mechanism that is meant to protect your data integrity. The GIL only allows data to be modified by one thread at time and therefore doesn’t let threads corrupt data but also it doesn’t allow them to truly run concurrently. That is why some people say that Ruby and Python are not capable of (true) concurrency.

Global Interpreter Lock by Matt Aimonetti

However these people often don’t mention that the GIL makes single threaded programs faster, that multi-threaded programs are much easier to develop since the data structures are safe and finally that a lot of C extensions are not thread safe and without the GIL, these C extensions don’t behave properly. These arguments don’t convince everyone and that’s why you will hear some people say you should look at another Ruby implementation without a GIL, such as JRuby, Rubinius (hydra branch) or MacRuby (Rubinius & MacRuby also offer other concurrency approaches). If you are using an implementation without a GIL, then using threads in Ruby has exactly the same pros/cons than doing so in Java. However, it means that now you have to deal with the nightmare of threads: making sure your data is safe, doesn’t deadlock, check that your code, your libs, plugins and gems are thread safe. Also, running too many threads might affect the performance because your OS doesn’t have enough resources to allocate and it ends up spending its time context switching. It’s up to you to see if it’s worth it for your project.

Multiple processes & forking

That’s the most commonly used solution to gain concurrency when using Ruby and Python. Because the default language implementation isn’t capable of true concurrency or because you want to avoid the challenges of thread programming, you might want to just start more processes. That’s really easy as long as you don’t want to share states between running processes. If you wanted to do so, you would need to use DRb, a message bus like RabbitMQ, or a shared data store like memcached or a DB. The caveat is that you now need to use a LOT more memory. If want to run 5 Rails processes and your app uses 100Mb you will now need 500Mb, ouch that’s a lot of memory! That is exactly what happens when you use a Rails webserver like Mongrel. Now some other servers like Passenger and Unicorn found a workaround, they rely on unix forking. The advantage of forking in an unix environment implementing the copy-on-write semantics is that we create a new copy of the main process but they both “share” the same physical memory. However, each process can modify its own memory without affecting the other processes. So now, Passenger can load your 100Mb Rails app in a process, then fork this process 5 times and the total footprint will be just a bit more than 100Mb and you can now handle 5X more concurrent requests. Note that if you are allocating memory in your request processing code (read controller/view) your overall memory will grow but you can still run many more processes before running out of memory. This approach is appealing because really easy and pretty safe. If a forked process acts up or leaks memory, just destroy it and create a new fork from the master process. Note that this approach is also used in Resque, the async job processing solution by GitHub.

This solution works well if you want to duplicate a full process like a webserver, however it gets less interesting when you just want to execute some code “in the background”. Resque took this approach because by nature async jobs can yield weird results, leak memory or hang. Dealing with forks allows for an external control of the processes and the cost of the fork isn’t a big deal since we are already in an async processing approach.

Screenshot of GitHub's repository forking

Actors/Fibers

Earlier we talked a bit about the actor model. Since Ruby 1.9, developers now have access to a new type of “lightweight” threads called Fibers. Fibers are not actors and Ruby doesn’t have a native Actor model implementation but some people wrote some actor libs on top of fibers. A fiber is like a simplified thread which isn’t scheduled by the VM but by the programmer. Fibers are like blocks which can be paused and resumed from the outside of from within themselves. Fibers are faster and use less memory than threads as demonstrated in this blog post. However, because of the GIL, you still cannot truly run more than one concurrent fiber by thread and if you want to use multiple CPU cores, you will need to run fibers within more than one thread. So how do fibers help with concurrency? The answer is that they are part of a bigger solution. Fiber allow developers to manually control the scheduling of “concurrent” code but also to have the code within the fiber to auto schedule itself. That’s pretty big because now you can wrap an incoming web request in its own fiber and tell it to send a response back when it’s done doing its things. In the meantime, you can move on the to next incoming request. Whenever a request within a fiber is done, it will automatically resume itself and be returned. Sounds great right? Well, the only problem is that if you are doing any type of blocking IO in a fiber, the entire thread is blocked and the other fibers aren’t running. Blocking operations are operations like database/memcached queries, http requests… basically things you are probably triggering from your controllers. The good news is that the “only” problem to fix now is to avoid blocking IOs. Let’s see how to do that.

fiber

Non blocking IOs/Reactor pattern.

The reactor pattern is quite simple to understand really. The heavy work of making blocking IO calls is delegated to an external service (reactor) which can receive concurrent requests. The service handler (reactor) is given callback methods to trigger asynchronously based on the type of response received. Let me take a limited analogy to hopefully explain the design better. It’s a bit like if you were asking someone a hard question, the person will take a while to reply but his/her reply will make you decide if you raise a flag or not. You have two options, or you choose to wait for the response and decide to raise the flag based on the response, or your flag logic is already defined and you tell the person what to do based on their answer and move on without having to worry about waiting for the answer. The second approach is exactly what the reactor pattern is. It’s obviously slightly more complicated but the key concept is that it allows your code to define methods/blocks to be called based on the response which will come later on.

Reactor Pattern illustrated in Matt Aimonetti's blog

In the case of a single threaded webserver that’s quite important. When a request comes in and your code makes a DB query, you are blocking any other requests from being processed. To avoid that, we could wrap our request in a fiber, trigger an async DB call and pause the fiber so another request can get processed as we are waiting for the DB. Once the DB query comes back, it wakes up the fiber it was trigger from, which then sends the response back to the client. Technically, the server can still only send one response at a time, but now fibers can run in parallel and don’t block the main tread by doing blocking IOs (since it’s done by the reactor).

This is the approach used by Twisted, EventMachine and Node.js. Ruby developers can use EventMachine or an EventMachine based webserver like Thin as well as EM clients/drivers to make non blocking async calls. Mix that with some Fiber love and you get Ruby concurrency. Be careful though, using Thin, non blocking drivers and Rails in threadsafe mode doesn’t mean you are doing concurrent requests. Thin/EM only use one thread and you need to let it know that it’s ok to handle the next request as we are waiting. This is done by deferring the response and let the reactor know about it.

The obvious problem with this approach is that it forces you to change the way you write code. You now need to set a bunch of callbacks, understand the Fiber syntax, and use deferrable responses, I have to admit that this is kind of a pain. If you look at some Node.js code, you will see that it’s not always an elegant approach. The good news tho, is that this process can be wrapped and your code can be written as it if was processed synchronously while being handled asynchronously under the covers. This is a bit more complex to explain without showing code, so this will be the topic of a future post. But I do believe that things will get much easier soon enough.

Conclusion

High concurrency with Ruby is doable and done by many. However, it could made easier. Ruby 1.9 gave us fibers which allow for a more granular control over the concurrency scheduling, combined with non-blocking IO, high concurrency can be achieved. There is also the easy solution of forking a running process to multiply the processing power. However the real question behind this heated debate is what is the future of the Global Interpreter Lock in Ruby, should we remove it to improve concurrency at the cost of dealing with some new major threading issues, unsafe C extensions, etc..? Alternative Ruby implementers seem to believe so, but at the same time Rails still ships with a default mutex lock only allowing requests to be processed one at a time, the reason given being that a lot of people using Rails don’t write thread safe code and a lot of plugins are not threadsafe. Is the future of concurrency something more like libdispatch/GCD where the threads are handled by the kernel and the developer only deals with a simpler/safer API?

Further reading:


Similar Posts

, , ,

  1. #1 by Fonsan - February 23rd, 2011 at 00:19

    Hey

    Great article!

    I would love if you took a look at https://github.com/Fonsan/dunder
    a gem I finished a week back that tries to solve this exact problem with concurrent IO within a single request in ex rails with threads. The former default mysql gem has been a great offender to rails concurrency but now has been replaced with mysql2 which everyone should start using since it also solves som previous encoding issues

    Regards

  2. #2 by Seb - February 23rd, 2011 at 00:29

    Yes, definately great post, as usual.

  3. #3 by raggi - February 23rd, 2011 at 00:41

    Your article is well authored, however I disagree with several things here:

    – Actors are not a different concurrency model. They’re an abstraction on messaging and stack based paradigms.
    – Using Fibers in ruby does not allow you to achieve “higher concurrency than threads”. Write yourself some benchmarks.
    – rb_threads_select *is* an instance of the reactor pattern. People seem to miss this all the time. 1.8 with threads, using ruby level read methods, is a fibered reactor pattern.
    – As I have explained elsewhere (http://news.ycombinator.com/item?id=2102558) the only thing that really saves you from complex runtime semantics (different chunks of code running at different times) is good engineering. Proper encapsulation, explicit and debuggable state management, and so on. Fibers (stacks in general) are not ideal for this at high levels of concurrency, because they’re a total bitch to debug in a production environment. Do you really want to have to walk up 20 stacks to find the one that’s holding onto an object in a mid-stack frame?

    You still need to use locks in Fibered code, it’s just even less clear to people when, and how to lock. For a start, ruby has no transactional mutation primitives that compliment threads (mostly because you need to tie into the scheduler, which is left up to the user with Fibers). Some infrastructure for this could be provided, but no one’s written one yet (open source at least), as it’s expensive, and hard (brain) work.

    A request/response paradigm is so easy to thread it’s not even funny. Create an instance of the request and response context as you start servicing a request, and only use objects that are linked as a subtree of that. Pass into that object an allocation from a pool (or provide a simple transactional request method to fetch from the pool lazily) and you’re away. You can spawn as many of these encapsulated representations as you like without the risk of nasty interactions.

    Yes there are a lot of badly written plugins out there, but if you’re talking about scaling, then you’ll want to be in the very least reading (and grok’ing) the code that you use anyway. A runtime extend here or a singleton method definition there, and you’ll be completely toasted for many other reasons, on most interpreters, and that’s far more costly than your IO pattern.

  4. #4 by Hongli Lai - February 23rd, 2011 at 03:43

    Raggi is right with all his points. Ruby 1.9 fibers are essentially the same as Ruby 1.8 threads except you have to manually schedule fibers, and with fibers you can still run into concurrent modification issues.

  5. #5 by siva - February 23rd, 2011 at 06:06

    Hi,

    I am a Tyro in Ruby and I feel that all of the people here are fantastic in their knowledge and abilties. Hats off Guys !!!!!!

  6. #6 by Daniel Ribeiro - February 23rd, 2011 at 06:45

    It is quite possible to use high level concurrency primitives, like Actors, in Ruby. Actually JRuby. I’ve wrote a small library that uses the currently most peformant actor library for Scala: https://github.com/danielribeiro/RubyOnAkka/

  7. #7 by Hiram Chirino - February 23rd, 2011 at 06:58

    Great post! I totally agree that the future of high performance concurrency will use something like libdispatch/GCD. For the java/scala folks, I encourage you to look into HawtDispatch.It’s a pure java libdispatch style API for java and scala.

    • #8 by Charles Oliver Nutter - February 23rd, 2011 at 11:50

      I’ve started hacking a clone of the MacRuby GCD API that wraps HawtDispatch here: https://github.com/headius/jcd

      I’d love to have folks help out with that so that JRuby users can have a nice GCD-like API too.

  8. #9 by Mike Perham - February 23rd, 2011 at 09:24

    I’ve moved away from recommending Fibers for use in production systems. As raggi points out, there are simply too many issues for it to be worth it in 99% of the cases. Ruby 1.8 is the closest thing to “good” Fibers, since green threads + ruby I/O methods are effectively a fibered, reactor system.

    If I were starting a large concurrent system today, I’d use JRuby and experiment with actors in Rubinius hydra. Alas, they are undocumented at the moment.

    • #10 by Charles Oliver Nutter - February 23rd, 2011 at 11:25

      You know what I’d like to see? A standard actor API across impls. There are already actor API gems out there…and maybe Rubinius’s is nice…but there’s no reason all impls shouldn’t have equivalent actor support out of the box or via a gem.

      • #11 by Matt Aimonetti - February 23rd, 2011 at 23:01

        I agree, that’d be a really nice thing to have.

  9. #12 by Tony Arcieri - February 23rd, 2011 at 10:36

    If you’re looking for Ruby event frameworks, also be sure to check out my framework cool.io:

    http://coolio.github.com

    It has a simple, easy-to-use DSL that should get you up and running quickly.

  10. #13 by Charles Oliver Nutter - February 23rd, 2011 at 11:52

    To be honest, I think the Ruby community needs to push for a standard actor/executor API in Ruby, perhaps via a gem that can be custom-fit for various impls (like, use existing Java libs when on JRuby). We will *never* get to the point of having solid concurrency-friendly Ruby libraries until there’s standard non-thread mechanisms to safely do concurrency across all impls. Ruby impl-specific implementations of actors, etc, won’t help.

    We need a “rack” for Actors (as far as something standard everyone can rely on)

    • #14 by Mike Perham - February 23rd, 2011 at 13:12

      I can’t thumb up this comment enough. I was literally thinking the exact same thing this past weekend.

  11. #15 by Charles Oliver Nutter - February 23rd, 2011 at 11:54

    Ahh, another project of mine that’s very JRuby-specific, but perhaps interesting: “Cloby” https://github.com/headius/cloby

    Cloby basically gives you a new class Clojure::Object you can extend in Ruby code. The resulting class will use Clojure’s STM for all instance variable updates, and will require you to call Kernel#dosync (added by Cloby) to make such updates.

    Again, I’d love to see a standard STM API that each impl could provide in their own special way. Meanwhile, Cloby is a nice toy for trying that out.

  12. #16 by Zeno Davatz - February 23rd, 2011 at 13:17

    Thank you for this great article!

    Why do you not mention mod_ruby? That solves a lot of problems for many RPS and is very stable on Linux.

    Best
    Zeno

    • #17 by Matt Aimonetti - February 23rd, 2011 at 23:03

      mod_ruby, I haven’t heard of that since 2004 or so :p Do you mean mod_rails aka Phusion Passenger? I mentioned it, Passenger uses multiple processes to provide concurrency/parallelism.

      • #18 by Zeno Davatz - February 24th, 2011 at 13:07

        I mean mod_ruby – but I do not know if mod_rails is based on mod_ruby code and thought. I have to check that (short check seems that it really is similar but I am not a rails user). I know from our daily business that mod_ruby can handle hundreds of concurrent users with a decent memory usage at very good response times. Setup is very easy with Gentoo-Linux.

        Question is, if Passenger is easy to configure for non-Rails-Apps.

        Best
        Zeno

        • #19 by Matt Aimonetti - February 24th, 2011 at 17:30

          Zeno, yes Passenger is really easy to install and configure if you are using a Rack app.

  13. #20 by Tapajós - February 23rd, 2011 at 16:00

    Great article Matt.

  14. #21 by Max Horbul - February 24th, 2011 at 12:08

    This is a great article. It reminded me very interesting talk made by @igrigorik on Rails Conf 2010 in Baltimor, MD. It seems he spoke about the same on OSCON. Here is the awesomely clear and detailed explanation with the concrete solution, which could help to solve concurrency issues, in his blog post with video http://goo.gl/B4ZN

  15. #22 by Eleanor McHugh - February 24th, 2011 at 17:10

    Hi Matt,

    Nice overview. Just a couple of updates to your further reading links:

    Elise Huard’s slides from RubyConf are at:

    http://www.slideshare.net/ehuard/concurrency-rubies-plural

    and confreaks have the video up at:

    http://confreaks.net/videos/447-rubyconf2010-concurrency-rubies-plural

    The session’s basically just a gloss due to the restrictions of the format (as it was we overran our time slot) but there’s lots of code in the slides that we hope illustrates the points we were trying to get across.

    • #23 by Matt Aimonetti - February 24th, 2011 at 17:30

      Thanks Eleanor, I already had added Elise’s slides to the extra resources, but thanks for the link to the video.

  16. #24 by deepak kannan - February 24th, 2011 at 21:05

    How does fiber allow itself to be manual scheduled but the “code within the fiber”
    to auto schedule itself, are both possible at the same time?
    From the blog and fiber docs it seems like fiber has to be manually scheduled.
    When we pause the fiber, is the code running inside the fiber not-paused,
    what is the separation between the fiber and code running inside it?

    > However, because of the GIL, you still cannot truly run more than
    > one concurrent fiber by thread and if you want to use multiple CPU
    > cores, you will need to run fibers within more than one thread. So
    > how do fibers help with concurrency? The answer is that they are
    > part of a bigger solution. Fiber allow developers to manually control
    > the scheduling of “concurrent” code but also to have the code within
    > the fiber to auto schedule itself.

    • #25 by brainopia - March 2nd, 2011 at 08:08

      You can spawn fibers inside event loop and every fiber before pausing fires off some async operation with callback which has code to resume current fiber. When event loop gets control and finishes an async operation it fires off callback therefore resuming specific Fiber.

  17. #26 by Zeno Davatz - February 25th, 2011 at 00:05

    Matt Aimonetti :
    Zeno, yes Passenger is really easy to install and configure if you are using a Rack app.

    If you are not running rails and or Rack it still seems easier to setup mod_ruby on Linux instead of Phusion/Passenger. mod_ruby has excellent scaling.

Comments are closed.