Posts Tagged fibers

About concurrency and the GIL

During RubyConf 2011, concurrency was a really hot topic. This is not a new issue, and the JRuby team has been talking about true concurrency for quite a while . The Global Interpreter Lock has also been in a subject a lot of discussions in the Python community and it’s not surprising that the Ruby community experiences the same debates since the evolution of their implementations are somewhat similar. (There might also be some tension between EngineYard hiring the JRuby and Rubinius teams and Heroku which recently hired Matz (Ruby’s creator) and Nobu, the #1 C Ruby contributor)

The GIL was probably even more of a hot topic now that Rubinius is about the join JRuby and MacRuby in the realm of GIL-less Ruby implementations.

During my RubyConf talk (slides here), I tried to explain how C Ruby works and why some decisions like having a GIL were made and why the Ruby core team isn’t planning on removing this GIL anytime soon. The GIL is something a lot of Rubyists love to hate, but a lot of people don’t seem to question why it’s here and why Matz doesn’t want to remove it. Defending the C Ruby decision isn’t quite easy for me since I spend my free time working on an alternative Ruby implementation which doesn’t use a GIL (MacRuby). However, I think it’s important that people understand why the MRI team (C Ruby team) and some Pythonistas feels so strongly about the GIL.

What is the GIL?

Here is a quote from the Python wiki:

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.) [...] The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.

The same basically applies to C Ruby. To illustrate the quote above, here is a diagram representing two threads being executed by C Ruby:

Fair thread scheduling in Ruby by Matt Aimonetti

Such a scheduling isn’t a problem at all when you only have 1 cpu, since a cpu can only execute a piece of code at a time and context switching happens all the time to allow the machine to run multiple processes/threads in parallel. The problem is when you have more than 1 CPU because in that case, if you were to only run 1 Ruby process, then you would most of the time only use 1 cpu at a time. If you are running on a 8 cpu box, that’s not cool at all! A lot of people stop at this explanation and imagine that their server can only handle one request at a time and they they rush to sign Greenpeace petitions asking Matz to make Ruby greener by optimizing Ruby and saving CPU cycles. Well, the reality is slightly different, I’ll get back to that in a minute. Before I explain “ways to achieve true concurrency with CRuby, let me explain why C Ruby uses a GIL and why each implementation has to make an important choice and in this case both CPython and C Ruby chose to keep their GIL.


Why a GIL in the first place?

  • It makes developer’s lives easier (it’s harder to corrupt data)
  • It avoids race conditions within C extensions
  • It makes C extensions development easier (no write barriers..)
  • Most of the C libraries which are wrapped are not thread safe
  • Parts of Ruby’s implementation aren’t threadsafe (Hash for instance)
As you can see the arguments can be organized in two main categories: data safety and C extensions/implementation. An implementation which doesn’t rely too much on C extensions (because they run a bit slow, or because code written in a different language is preferred) is only faced with one argument: data safety.


Should C Ruby remove its GIL?

  • No: it potentially makes Ruby code unsafe(r)
  • No: it would break existing C extensions
  • No: it would make writing C extensions harder
  • No: it’s a lot of work to change make C Ruby threadsafe
  • No: Ruby is fast enough in most cases
  • No: Memory optimization and GC is more important to tackle first
  • No: C Ruby code would run slower
  • Yes: we really need better/real concurrency
  • Yes: Rubber boots analogy (Gustavo Niemeyer)
Don’t count the amount of pros/cons to jump to the conclusion that removing the GIL is a bad idea. A lot of the arguments for removing the GIL are related. At the end of the day it boils down to data safety. During the Q&A section of my RubyConf talk, Matz came up on stage and said data safety was the main reason why C Ruby still has a GIL. Again, this is a topic which was discussed at length in the Python community and I’d encourage you to read arguments from the Jython (the equivalent of JRuby for Python) developers, the PyPy (the equivalent of Rubinius in the Python community) and CPython developers. (a good collection of arguments are actually available in the comments related to the rubber boots post mentioned earlier)


How can true concurrency be achieved using CRuby?

  • Run multiple processes (which you probably do if you use Thin, Unicorn or Passenger)
  • Use event-driven programming with a process per CPU
  • MultiVMs in a process. Koichi presented his plan to run multiple VMs within a process.  Each VM would have its own GIL and inter VM communication would be faster than inter process. This approach would solve most of the concurrency issues but at the cost of memory.
Note:  forking a process only saves memory when using REE since it implements a GC patch that makes the forking process Copy on Write friendly. The Ruby core team worked on a patch for Ruby 1.9 to achieve the same result. Nari & Matz are currently working on improving the implementation to make sure overall performance isn’t affected.

Finally, when developing web applications, each thread spend quite a lot of time in IOs which, as mentioned above won’t block the thread scheduler. So if you receive two quasi-concurrent requests you might not even be affected by the GIL as illustrated in this diagram from Yehuda Katz:

This is a simplified diagram but you can see that a good chunk of the request life cycle in a Ruby app doesn’t require the Ruby thread to be active (CPU Idle blocks) and therefore these 2 requests would be processed almost concurrently.

To boil it down to something simplified, when it comes to the GIL, an implementor has to chose between data safety and memory usage. But it is important to note that context switching between threads is faster than context switching between processes and data safety can and is often achieved in environments without a GIL, but it requires more knowledge and work on the developer side.



The decision to keep or remove the GIL is a bit less simple that it is often described. I respect Matz’ decision to keep the GIL even though, I would personally prefer to push the data safety responsibility to the developers. However, I do know that many Ruby developers would end up shooting themselves in the foot and I understand that Matz prefers to avoid that and work on other ways to achieve true concurrency without removing the GIL. What is great with our ecosystem is that we have some diversity, and if you think that a GIL less model is what you need, we have some great alternative implementations that will let you make this choice. I hope that this article will help some Ruby developers understand and appreciate C Ruby’s decision and what this decision means to them on a daily basis.

, , , , , , , , ,


Ruby concurrency explained

Concurrency is certainly not a new problem but it’s getting more and more attention as machines start having more than 1 core, that web traffic increases drastically and that some new technologies show up saying that they are better because they handle concurrency better.
If that helps, think of concurrency as multitasking. When people say that they want concurrency, they say that they want their code to do multiple different things at the same time. When you are on your computer, you don’t expect to have to choose between browsing the web and listening to some music. You more than likely want to run both concurrently. It’s the same thing with your code, if you are running a webserver, you probably don’t want it to only process one request at a time.
The aim of this article is to explain as simply as possible the concept of concurrency in Ruby, the reason why it’s a complicated topic and finally the different solutions to achieve concurrency.

First off, if you are not really familiar with concurrency, take a minute to read the wikipedia article on the topic which is a great recap on the subject. But now, you should have noticed that my above example was more about parallel programming than concurrency, but we’ll come back to that in a minute.

The real question at the heart of the quest for concurrency is: “how to increase code throughput”.

We want our code to perform better, and we want it to do more in less time. Let’s take two simple and concrete examples to illustrate concurrency. First, let’s pretend you are writing a twitter client, you probably want to let the user scroll his/her tweets while the latest updates are  being fetched. In other words, you don’t want to block the main loop and interrupt the user interaction while your code is waiting for a response from the Twitter API. To do that, a common solution is to use multiple threads. Threads are basically processes that run in the same memory context. We would be using one thread for the main event loop and another thread to process the remote API request. Both threads share the same memory context so once the Twitter API thread is done fetching the data it can update the display. Thankfully, this is usually transparently handled by asynchronous APIs (provided by the OS or the programming language std lib) which avoid blocking the main thread.

The second example is a webserver. Let’s say you want to run a Rails application. Because you are awesome, you expect to see a lot of traffic. Probably more than 1 QPS (query/request per second). You benchmarked your application and you know that the average response time is approximately 100ms. Your Rails app can therefore handle 10QPS using a single process (you can do 10 queries at 100ms in a second).

But what happens if your application gets more than 10 requests per second? Well, it’s simple, the requests will be backed up and will take longer until some start timing out. This is why you want to improve your concurrency. There are different ways to do that, a lot of people feel really strong about these different solutions but they often forget to explain why they dislike one solution or prefer one over the other. You might have heard people conclusions which are often one of these: Rails can’t scale, you only get concurrency with JRuby, threads suck, the only way to concurrency is via threads, we should switch to Erlang/Node.js/Scala, use fibers and you will be fine, add more machines, forking > threading.  Depending on who said what and how often you heard it on twitter, conferences, blog posts, you might start believing what others are saying. But do you really understand why people are saying that and are you sure they are right?

The truth is that this is a complicated matter. The good news is that it’s not THAT complicated!

The thing to keep in mind is that the concurrency models are often defined by the programming language you use. In the case of Java, threading is the usual solution, if you want your Java app to be more concurrent, just run every single request in its own thread and you will be fine (kinda). In PHP, you simply don’t have threads, instead you will start a new process per request. Both have pros and cons, the advantage of the Java threaded approach is that the memory is shared between the threads so you are saving in memory (and startup time), each thread can easily talk to each other via the shared memory. The advantage of PHP is that you don’t have to worry about locks, deadlocks, threadsafe code and all that mess hidden behind threads. Described like that it looks pretty simple, but you might wonder why PHP doesn’t have threads and why Java developers don’t prefer starting multiple processes. The answer is probably related to the language design decisions. PHP is a language designed for the web and for short lived processes. PHP code should be fast to load and not use too much memory. Java code is slower to boot and to warm up, it usually uses quite a lot of memory. Finally, Java is a general purpose programming language not designed primarily for the internet. Others programming languages like Erlang and Scala use a third approach: the actor model. The actor model is somewhat a bit of a mix of both solutions, the difference is that actors are a like threads which don’t share the same memory context. Communication between actors is done via exchanged messages ensuring that each actor handles its own state and therefore avoiding corrupt data (two threads can modify the same data at the same time, but an actor can’t receive two messages at the exact same time). We’ll talk about that design pattern later on, so don’t worry if you are confused.

What about Ruby? Should Ruby developers use threads, multiple processes, actors, something else? The answer is: yes!


Since version 1.9, Ruby has native threads (before that green threads were used). So in theory, if we would like to, we should be able to use threads everywhere like most Java developers do. Well, that’s almost true, the problem is that Ruby, like Python uses a Global Interpreter Lock (aka GIL). This GIL is a locking mechanism that is meant to protect your data integrity. The GIL only allows data to be modified by one thread at time and therefore doesn’t let threads corrupt data but also it doesn’t allow them to truly run concurrently. That is why some people say that Ruby and Python are not capable of (true) concurrency.

Global Interpreter Lock by Matt Aimonetti

However these people often don’t mention that the GIL makes single threaded programs faster, that multi-threaded programs are much easier to develop since the data structures are safe and finally that a lot of C extensions are not thread safe and without the GIL, these C extensions don’t behave properly. These arguments don’t convince everyone and that’s why you will hear some people say you should look at another Ruby implementation without a GIL, such as JRuby, Rubinius (hydra branch) or MacRuby (Rubinius & MacRuby also offer other concurrency approaches). If you are using an implementation without a GIL, then using threads in Ruby has exactly the same pros/cons than doing so in Java. However, it means that now you have to deal with the nightmare of threads: making sure your data is safe, doesn’t deadlock, check that your code, your libs, plugins and gems are thread safe. Also, running too many threads might affect the performance because your OS doesn’t have enough resources to allocate and it ends up spending its time context switching. It’s up to you to see if it’s worth it for your project.

Multiple processes & forking

That’s the most commonly used solution to gain concurrency when using Ruby and Python. Because the default language implementation isn’t capable of true concurrency or because you want to avoid the challenges of thread programming, you might want to just start more processes. That’s really easy as long as you don’t want to share states between running processes. If you wanted to do so, you would need to use DRb, a message bus like RabbitMQ, or a shared data store like memcached or a DB. The caveat is that you now need to use a LOT more memory. If want to run 5 Rails processes and your app uses 100Mb you will now need 500Mb, ouch that’s a lot of memory! That is exactly what happens when you use a Rails webserver like Mongrel. Now some other servers like Passenger and Unicorn found a workaround, they rely on unix forking. The advantage of forking in an unix environment implementing the copy-on-write semantics is that we create a new copy of the main process but they both “share” the same physical memory. However, each process can modify its own memory without affecting the other processes. So now, Passenger can load your 100Mb Rails app in a process, then fork this process 5 times and the total footprint will be just a bit more than 100Mb and you can now handle 5X more concurrent requests. Note that if you are allocating memory in your request processing code (read controller/view) your overall memory will grow but you can still run many more processes before running out of memory. This approach is appealing because really easy and pretty safe. If a forked process acts up or leaks memory, just destroy it and create a new fork from the master process. Note that this approach is also used in Resque, the async job processing solution by GitHub.

This solution works well if you want to duplicate a full process like a webserver, however it gets less interesting when you just want to execute some code “in the background”. Resque took this approach because by nature async jobs can yield weird results, leak memory or hang. Dealing with forks allows for an external control of the processes and the cost of the fork isn’t a big deal since we are already in an async processing approach.

Screenshot of GitHub's repository forking


Earlier we talked a bit about the actor model. Since Ruby 1.9, developers now have access to a new type of “lightweight” threads called Fibers. Fibers are not actors and Ruby doesn’t have a native Actor model implementation but some people wrote some actor libs on top of fibers. A fiber is like a simplified thread which isn’t scheduled by the VM but by the programmer. Fibers are like blocks which can be paused and resumed from the outside of from within themselves. Fibers are faster and use less memory than threads as demonstrated in this blog post. However, because of the GIL, you still cannot truly run more than one concurrent fiber by thread and if you want to use multiple CPU cores, you will need to run fibers within more than one thread. So how do fibers help with concurrency? The answer is that they are part of a bigger solution. Fiber allow developers to manually control the scheduling of “concurrent” code but also to have the code within the fiber to auto schedule itself. That’s pretty big because now you can wrap an incoming web request in its own fiber and tell it to send a response back when it’s done doing its things. In the meantime, you can move on the to next incoming request. Whenever a request within a fiber is done, it will automatically resume itself and be returned. Sounds great right? Well, the only problem is that if you are doing any type of blocking IO in a fiber, the entire thread is blocked and the other fibers aren’t running. Blocking operations are operations like database/memcached queries, http requests… basically things you are probably triggering from your controllers. The good news is that the “only” problem to fix now is to avoid blocking IOs. Let’s see how to do that.


Non blocking IOs/Reactor pattern.

The reactor pattern is quite simple to understand really. The heavy work of making blocking IO calls is delegated to an external service (reactor) which can receive concurrent requests. The service handler (reactor) is given callback methods to trigger asynchronously based on the type of response received. Let me take a limited analogy to hopefully explain the design better. It’s a bit like if you were asking someone a hard question, the person will take a while to reply but his/her reply will make you decide if you raise a flag or not. You have two options, or you choose to wait for the response and decide to raise the flag based on the response, or your flag logic is already defined and you tell the person what to do based on their answer and move on without having to worry about waiting for the answer. The second approach is exactly what the reactor pattern is. It’s obviously slightly more complicated but the key concept is that it allows your code to define methods/blocks to be called based on the response which will come later on.

Reactor Pattern illustrated in Matt Aimonetti's blog

In the case of a single threaded webserver that’s quite important. When a request comes in and your code makes a DB query, you are blocking any other requests from being processed. To avoid that, we could wrap our request in a fiber, trigger an async DB call and pause the fiber so another request can get processed as we are waiting for the DB. Once the DB query comes back, it wakes up the fiber it was trigger from, which then sends the response back to the client. Technically, the server can still only send one response at a time, but now fibers can run in parallel and don’t block the main tread by doing blocking IOs (since it’s done by the reactor).

This is the approach used by Twisted, EventMachine and Node.js. Ruby developers can use EventMachine or an EventMachine based webserver like Thin as well as EM clients/drivers to make non blocking async calls. Mix that with some Fiber love and you get Ruby concurrency. Be careful though, using Thin, non blocking drivers and Rails in threadsafe mode doesn’t mean you are doing concurrent requests. Thin/EM only use one thread and you need to let it know that it’s ok to handle the next request as we are waiting. This is done by deferring the response and let the reactor know about it.

The obvious problem with this approach is that it forces you to change the way you write code. You now need to set a bunch of callbacks, understand the Fiber syntax, and use deferrable responses, I have to admit that this is kind of a pain. If you look at some Node.js code, you will see that it’s not always an elegant approach. The good news tho, is that this process can be wrapped and your code can be written as it if was processed synchronously while being handled asynchronously under the covers. This is a bit more complex to explain without showing code, so this will be the topic of a future post. But I do believe that things will get much easier soon enough.


High concurrency with Ruby is doable and done by many. However, it could made easier. Ruby 1.9 gave us fibers which allow for a more granular control over the concurrency scheduling, combined with non-blocking IO, high concurrency can be achieved. There is also the easy solution of forking a running process to multiply the processing power. However the real question behind this heated debate is what is the future of the Global Interpreter Lock in Ruby, should we remove it to improve concurrency at the cost of dealing with some new major threading issues, unsafe C extensions, etc..? Alternative Ruby implementers seem to believe so, but at the same time Rails still ships with a default mutex lock only allowing requests to be processed one at a time, the reason given being that a lot of people using Rails don’t write thread safe code and a lot of plugins are not threadsafe. Is the future of concurrency something more like libdispatch/GCD where the threads are handled by the kernel and the developer only deals with a simpler/safer API?

Further reading:

, , ,