About concurrency and the GIL


During RubyConf 2011, concurrency was a really hot topic. This is not a new issue, and the JRuby team has been talking about true concurrency for quite a while . The Global Interpreter Lock has also been in a subject a lot of discussions in the Python community and it’s not surprising that the Ruby community experiences the same debates since the evolution of their implementations are somewhat similar. (There might also be some tension between EngineYard hiring the JRuby and Rubinius teams and Heroku which recently hired Matz (Ruby’s creator) and Nobu, the #1 C Ruby contributor)

The GIL was probably even more of a hot topic now that Rubinius is about the join JRuby and MacRuby in the realm of GIL-less Ruby implementations.

During my RubyConf talk (slides here), I tried to explain how C Ruby works and why some decisions like having a GIL were made and why the Ruby core team isn’t planning on removing this GIL anytime soon. The GIL is something a lot of Rubyists love to hate, but a lot of people don’t seem to question why it’s here and why Matz doesn’t want to remove it. Defending the C Ruby decision isn’t quite easy for me since I spend my free time working on an alternative Ruby implementation which doesn’t use a GIL (MacRuby). However, I think it’s important that people understand why the MRI team (C Ruby team) and some Pythonistas feels so strongly about the GIL.

What is the GIL?

Here is a quote from the Python wiki:

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.) [...] The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.

The same basically applies to C Ruby. To illustrate the quote above, here is a diagram representing two threads being executed by C Ruby:

Fair thread scheduling in Ruby by Matt Aimonetti

Such a scheduling isn’t a problem at all when you only have 1 cpu, since a cpu can only execute a piece of code at a time and context switching happens all the time to allow the machine to run multiple processes/threads in parallel. The problem is when you have more than 1 CPU because in that case, if you were to only run 1 Ruby process, then you would most of the time only use 1 cpu at a time. If you are running on a 8 cpu box, that’s not cool at all! A lot of people stop at this explanation and imagine that their server can only handle one request at a time and they they rush to sign Greenpeace petitions asking Matz to make Ruby greener by optimizing Ruby and saving CPU cycles. Well, the reality is slightly different, I’ll get back to that in a minute. Before I explain “ways to achieve true concurrency with CRuby, let me explain why C Ruby uses a GIL and why each implementation has to make an important choice and in this case both CPython and C Ruby chose to keep their GIL.

 

Why a GIL in the first place?

  • It makes developer’s lives easier (it’s harder to corrupt data)
  • It avoids race conditions within C extensions
  • It makes C extensions development easier (no write barriers..)
  • Most of the C libraries which are wrapped are not thread safe
  • Parts of Ruby’s implementation aren’t threadsafe (Hash for instance)
As you can see the arguments can be organized in two main categories: data safety and C extensions/implementation. An implementation which doesn’t rely too much on C extensions (because they run a bit slow, or because code written in a different language is preferred) is only faced with one argument: data safety.

 

Should C Ruby remove its GIL?

  • No: it potentially makes Ruby code unsafe(r)
  • No: it would break existing C extensions
  • No: it would make writing C extensions harder
  • No: it’s a lot of work to change make C Ruby threadsafe
  • No: Ruby is fast enough in most cases
  • No: Memory optimization and GC is more important to tackle first
  • No: C Ruby code would run slower
  • Yes: we really need better/real concurrency
  • Yes: Rubber boots analogy (Gustavo Niemeyer)
Don’t count the amount of pros/cons to jump to the conclusion that removing the GIL is a bad idea. A lot of the arguments for removing the GIL are related. At the end of the day it boils down to data safety. During the Q&A section of my RubyConf talk, Matz came up on stage and said data safety was the main reason why C Ruby still has a GIL. Again, this is a topic which was discussed at length in the Python community and I’d encourage you to read arguments from the Jython (the equivalent of JRuby for Python) developers, the PyPy (the equivalent of Rubinius in the Python community) and CPython developers. (a good collection of arguments are actually available in the comments related to the rubber boots post mentioned earlier)

 

How can true concurrency be achieved using CRuby?

  • Run multiple processes (which you probably do if you use Thin, Unicorn or Passenger)
  • Use event-driven programming with a process per CPU
  • MultiVMs in a process. Koichi presented his plan to run multiple VMs within a process.  Each VM would have its own GIL and inter VM communication would be faster than inter process. This approach would solve most of the concurrency issues but at the cost of memory.
Note:  forking a process only saves memory when using REE since it implements a GC patch that makes the forking process Copy on Write friendly. The Ruby core team worked on a patch for Ruby 1.9 to achieve the same result. Nari & Matz are currently working on improving the implementation to make sure overall performance isn’t affected.

Finally, when developing web applications, each thread spend quite a lot of time in IOs which, as mentioned above won’t block the thread scheduler. So if you receive two quasi-concurrent requests you might not even be affected by the GIL as illustrated in this diagram from Yehuda Katz:

This is a simplified diagram but you can see that a good chunk of the request life cycle in a Ruby app doesn’t require the Ruby thread to be active (CPU Idle blocks) and therefore these 2 requests would be processed almost concurrently.

To boil it down to something simplified, when it comes to the GIL, an implementor has to chose between data safety and memory usage. But it is important to note that context switching between threads is faster than context switching between processes and data safety can and is often achieved in environments without a GIL, but it requires more knowledge and work on the developer side.

 

Conclusion

The decision to keep or remove the GIL is a bit less simple that it is often described. I respect Matz’ decision to keep the GIL even though, I would personally prefer to push the data safety responsibility to the developers. However, I do know that many Ruby developers would end up shooting themselves in the foot and I understand that Matz prefers to avoid that and work on other ways to achieve true concurrency without removing the GIL. What is great with our ecosystem is that we have some diversity, and if you think that a GIL less model is what you need, we have some great alternative implementations that will let you make this choice. I hope that this article will help some Ruby developers understand and appreciate C Ruby’s decision and what this decision means to them on a daily basis.

Similar Posts

, , , , , , , , ,

  1. #1 by Matt Aimonetti - October 4th, 2011 at 05:59

    Hacker news discussion there: http://news.ycombinator.com/item?id=3070382

  2. #2 by Tom - October 4th, 2011 at 07:13

    Note: The following comment regards a feasible compiler approach to parallelization – not interpreter, and thus I am not sure how it applies nor have I thought that out. I hope the following comments yield some incentive to check out the suggestions below and determine if/how they may apply in an interpreter context for your parallelization project.

    If the software based on GIL can accomodate testing with n number of processors and test the performance with just 1-4 processors against the same code, it may find out that a global lock in a language based concurrent runtime system will at best improve performance linearly from 1-2 processors, and above 2 processors will remain flat at the 2 processor perfomance level.

    Don’t take my word for it, but do run this test first to convince yourself of the maximum performance you can achieve with GIL when more processors are added to the execution of a suitable parallel application.

    My experience is with doing the experiment for a parallel runtime library for a concurrent language of a large shared memory multiprocessing computer company, and when multithreaded/multiprocess locks (e.g. in a thread/process data structure) in a lock field were fully implemented, the collision rate of locking was less than 1%, and performance increase was very near linear as processors were added to the tests. Note: private memory is a necessary featue for thread/process data structures to avoid saving registers in shared memory and getting wacked by Segment Violations. (See sbrk).

    To aid debugging, it is best to implement internal data structures to flag state in the parallel runtime system that can be displayed with a parallel debugger.

    The way we controlled adding processors to the execution mix was to use a simple environment variable set to the number of processors to use in the current execution’s incarnation – i.e. a user controlled variable.

    Real multi-thread/multi-process locking in the runtime library should be your goal, otherwise, you’ll be stuck with GIL, especially is you don’t try the suggested testing above to know for sure, and it verifies that performance improvement is limited with GIL.

    When we verified the limitations of the single lock around the runtime system, we started referring to the lock as “the crock around the runtime system”.

    Hint: Choose to implement parallelizing in the runtime library and you can use the same frontend runtime interfaces with no change – i.e. this is a medium-grain parallelization approach with very close to linear performance improvement as processors are added. I tested this approach with a full complement of 18 processors on a 20 processor max system (2 reserved for the OS) and it’s performace was far greater than a single lock around the runtime system.

    This was a one person project for 9-12 months that basically started with a single-threaded execution-model runtime system for a concurrent language and used a two-phase approach

    1) single lock around the runtime system, and
    2) a lock-field in each multi-thread/multi-process task’s data structure in the full parallel runtime system version.

    to arrive at a powerful multi-threaded/multi-process solution.

  3. #3 by postmodern - October 4th, 2011 at 12:13

    Perhaps some of those C extensions could be rewritten as FFI bindings, and allow FFI to handle the thread safety?

  4. #4 by Andrew Grimm - October 4th, 2011 at 15:11

    One concern I have is that people who are using Ruby for things other than web sites aren’t ignored. Maybe Rails folk don’t have to worry about the GIL because of IO, but what about us Plain Old Ruby Object users?

    Wanting to use faster implementations of Ruby was what lead me to create the Small Eigen Collider, which I presented at RubyKaigi 2011.

    That being said, I’m not sure how much I’d use multi-threading unless Ruby has good approaches to ensuring the code you write is thread safe. I think the ability to enforce the lack of side-effects would be useful (and I suspect freeze wouldn’t be sufficient)

    Would it be fair to say that getting rid of the GIL is somewhat an all-or-nothing process, whereas other improvements that the core team can work on are more incremental in nature?

  5. #5 by U.Nakamura - October 4th, 2011 at 19:38

    Great article!
    This article explains the things which we would like to say without the place to leave.

  6. #6 by ara.t.howard - October 5th, 2011 at 05:38

    “I would personally prefer to push the data safety responsibility to the developers.”

    i get it. but thinking about that is terrifying. i read it like

    “I would personally prefer to push implementing transactions to developers in order to make the database faster.”

    a faster ruby is good, but my experience, which is considerable, is that 1 in 100 rubyists (or java-heads, c++ people, whatever…) are capable of writing good multi-threaded code, and 0 in 100 can prove it’s safety via a test.

    removal of the GIL without *also* removing thread primitives and giving a bomber mechanism of writing MT applications (message passing, queues, etc) would be an absolute disaster for almost all read world ruby uses.

  7. #7 by Steven G. Harms - October 6th, 2011 at 11:11

    Matt,

    I loved your talk at Rubyconf. The only downside, as far as i can see from it was the timing: you got had an audience of people ready to live it up in NO!

    Something you mentioned that’s really resonating more and more as we talk more about concurrency, the Actor model, etc. is the idea that many non-CS background Rubyists are not in a position to appreciate what’s going on in CRuby and you bade us all to take some time to appreciate what’s going on in it.

    Something that I have difficulty getting across is “starter C projects that don’t feel like drudgery.” When you find someone who is willing to take the C dive they tend to get snared up in the tools, the terms, the compilers. After that they tend to be pretty good in learning the syntax, but then are often wondering “Well, what can I *do* with it?” The gap between “I get the syntax” to looking at CRuby code is fairly large and there really doesn’t feel like there are some middle steps.

    How can we get people to level up in C realizing that many are coming from Ruby and or iOS / Java and don’t have the “bare metal” starting point? I would love to help Rubyists get to know the heart and soul of their language, but I find great difficulty in teaching that transition. Any advice?

    Steven

    • #8 by Matt Aimonetti - October 6th, 2011 at 11:30

      Steven, you are absolutely right about the timing issue. I tried to cover too much in too little time. I was wrongly under the impression that everything was important and it was better to give people a big picture of the situation. I now realize I was wrong and that’s why I focused more on concurrency and wrote this blog post.

      To answer your question, I’m not sure all Rubyists should be able to send patches to CRuby, but I think it’s important for all of us to understand some of the key elements of the implementation and the pros/cons. At the same time, I think it would be great for someone else to write a guide on the CRuby’s internals addressing an audience already somewhat familiar with C.

      Another thing Rubyists could do is to take the time to learn/understand a given aspect of how Ruby works and talk about it in conferences or on their blogs. A good example of that could be the constant lookup mechanism, the 1.9 bytecode, the parser etc…

      Finally, I think that working on some tools to reflect how the internals work would be very beneficial. I’ve been trying to find some time to write a small GUI around the GC’s profiler and ObjectSpace. I think that showing visually what CRuby does internally, would push people to look more into how the language is implemented.

      • #9 by Steven G. Harms - October 6th, 2011 at 14:19

        Let me clarify, I think you did a great job on the timing internals, i meant that it was unfortunate that your slot was the last one one the last day when there may not have been appropriate “focus” by the attendees :-D

        These are good ideas, as we mature as a community I think we’re going to see talks like you gave and like your suggest: more “mature” peeks into the heart of the language.

        In any case, thumbs up for a great presentation!

  1. No trackbacks yet.

Comments are closed.