Archive for category ruby

How to – cross domain ajax to a Ruby app

In some cases, you might have a bunch of apps running on different domains/subdomains and/or ports and you would like to make ajax requests between these services. The problem is that browsers wouldn’t let you make such requests because of the Same Origin Policy which only allowed them to make request to resources within the same domain.

However, most browsers (IE 8+, Firefox 3.5+, Safari 4+, Chrome) implement a simple way to allow cross domain requests as defined in this w3C document.

Of course, if your users have an old version of their browser, you  might have to look into jsonp or something else such as cheating by using iframes & setting document.domain. Let’s pretend for a minute that 100% of your users are on Chrome. The only thing you need to do is set a response header listing the accepted domains or “*” for all. A simple Rack middleware to do that would look like that.

 

class XOriginEnabler
  ORIGIN_HEADER = "Access-Control-Allow-Origin"
 
  def initialize(app, accepted_domain="*")
    @app = app
    @accepted_domain = accepted_domain
  end
 
  def call(env)
    status, header, body = @app.call(env)
    header[ORIGIN_HEADER] = @accepted_domain
    [status, header, body]
  end
end

And to use the middleware you would need to set it for use:

use XOriginEnabler

To enable all requests from whatever origin, or pass the white listed domain(s) as shown below.

use XOriginEnabler, "demo.mysite.com demo.mysite.fr demo.techcrunch.com"

For a full featured middleware, see this project.

6 Comments

Ruby optimization example and explanation

Recently I wrote a small DSL that allows the user to define some code that then gets executed later on and in different contexts. Imagine something like Sinatra where each route action is defined in a block and then executed in context of an incoming request.

The challenge is that blocks come with their context and you can’t execute a block in the context of another one.

Here is a reduction of the challenge I was trying to solve:

class SolutionZero
  def initialize(origin, &block)
    @origin = origin
    @block = block
  end
 
  def dispatch
    @block.call
  end
end
 
SolutionZero.new(42){ @origin + 1 }.dispatch
# undefined method `+' for nil:NilClass (NoMethodError)

The problem is that the block refers to the @origin instance variable which is not available in its context.
My first workaround was to use instance_eval:

class SolutionOne
  def initialize(origin, &block)
    @origin = origin
    @block = block
  end
 
  def dispatch
    self.instance_eval &@block
  end
end
 
SolutionOne.new(40){ @origin + 2}.dispatch
# 42

My workaround worked fine, since the block was evaluated in the context of the instance and therefore the @origin ivar is made available to block context. Technically, I was good to go, but I wasn’t really pleased with this solution. First using instance_eval often an indication that you are trying to take a shortcut. Then having to convert my block stored as a block back into a proc every single dispatch makes me sad. Finally, I think that this code is probably not performing as well as it could, mainly due to unnecessary object allocations and code evaluation.
I did some benchmarks replacing instance_eval by instance_exec since looking at the C code, instance_exec should be slightly faster. Turns out, it is not so I probably missed something when reading the implementation code.

I wrote some more benchmarks and profiled a loop of 2 million dispatches (only the #disptach method call on the same object). The GC profiler report showed that the GC was invoked 287 times and each invocation was blocking the execution for about 0.15ms.
Using Ruby’s ObjectSpace and disabling the GC during the benchmark, I could see that each loop allocates an object of type T_NODE which is more than likely our @block ivar converted back into a block. This is quite a waste. Furthermore, having to evaluate our block in a different context every single call surely isn’t good for performance.

So instead of doing the work at run time, why not doing it at load time? By that I mean that we can optimize the #dispatch method if we could “precompile” the method body instead of “proxying” the dispatch to an instance_eval call. Here is the code:

class SolutionTwo
  def initialize(origin, &block)
    @origin = origin
    implementation(block)
  end
 
  private
 
  def implementation(block)
    mod = Module.new
    mod.send(:define_method, :dispatch, block)
    self.extend mod
  end
end
 
SolutionTwo.new(40){ @origin + 2}.dispatch
# 42

This optimization is based on the fact that the benchmark (and the real life usage) creates the instance once and then calls #dispatch many times. So by making the initialization of our instance a bit slower, we can drastically improve the performance of the method call. We also still need to execute our block in the right context. And finally, each instance might have a different way to dispatch since it is defined dynamically at initialization. To work around all these issues, we create a new module on which we define a new method called dispatch and the body of this method is the passed block. Then we simply our instance using our new module.

Now every time we call #dispatch, a real method is dispatched which is much faster than doing an eval and no objects are allocated. Running the profiler and the benchmarks script used earlier, we can confirm that the GC doesn’t run a single time and that the optimized code runs 2X faster!

 

Once again, it’s yet another example showing that you should care about object allocation when dealing with code in the critical path. It also shows how to work around the block bindings. Now, it doesn’t mean that you have to obsess about object allocation and performance, even if my last implementation is 2X faster than the previous, we are only talking about a few microseconds per dispatch. That said microseconds do add up and creating too many objects will slow down even your faster code since the GC will stop-the-world as its cleaning up your memory. In real life, you probably don’t have to worry too much about low level details like that, unless you are working on a framework or sharing your code with others. But at least you can learn and understand why one approach is faster than the other, it might not be useful to you right away, but if you take programming as a craft, it’s good to understand how things work under the hood so you can make educated decisions.
 

Update:

@apeiros in the comments suggested a solution that works & performs the same as my solution, but is much cleaner:

class SolutionTwo
  def initialize(origin, &block)
    @origin = origin
    define_singleton_method(:dispatch, block) if block_given?
  end
end

, ,

12 Comments

Video game web framework design

In this post I will do my best to explain why and how I reinvented the wheel and wrote a custom web framework for some of Sony’s AAA console titles. My goal is to reflect on my work by walking you through the design process and some of the implementation decisions. This is not about being right or being wrong, it’s about designing a technical solution to solve concrete business challenges.

Problem Domain

The video game industry is quite special, to say the least. It shares a lot of similarities with the movie industry. The big difference is that the movie industry hasn’t evolved as quickly as the video game industry has. But the concept is the same, someone comes up with a great idea, finds a team/studio to develop the game and finds a publisher. The development length and budget depends on the type of game, but for a AAA console game, it usually takes a least a few million and a minimum of a year of work once the project has received the green light. The creation of such a game involves various teams, designers, artists, animators, audio teams, developers, producers, QA, marketing, management/overhead etc.. Once the game gets released, players purchase the whole game for a one time fee and the studio moves on to their next game. Of course things are not that simple, with the latest platforms, we now have the option to patch games, add DLC etc.. But historically, a console game is considered done when it ships, exactly like a movie, and very little work is scheduled post release.

Concretely such an approach exposes a few challenges when trying to implement online features for a AAA console title:

  • Communication with the game client network team
  • Scalability, performance
  • Insane deadlines, unstable design (constant change of requirements)
  • Can’t afford to keep on working on the system once released (time delimited projects)

 

Communication

As in most situations, communication is one of the biggest challenges. Communication is even harder in the video game industry since you have so many teams and experts involved. Each team speaks its own jargon, has its own expertise and its own deadlines. But all focus on the same goal: releasing the best game ever. The team I’m part of has implementing online features as its goal. That’s the way we bring business value to our titles. Concretely, that means that we provide the game client developers with a C++ SDK which connects to custom web APIs written in Ruby. The API implementations rely on various data stores (MySQL, Redis, Memcached, memory) to store and retrieve all sorts of game data.

Nobody but our team should care about the implementation details, after all, the whole point of providing an API is to provide a simple interface so others can do their part of the job in the easiest way possible. This is exactly where communication becomes a problem. The design of these APIs should be the result of the work of two teams with two different domains of expertise and different concerns. One team focuses on client performance, memory optimization and making the online resources available to the game engine without affecting the game play. The other, focuses on server performance, latency, scalability, data storage and system contention under load. Both groups have to come together to find a compromise making each other’s job doable. Unfortunately, things are not that simple and game designers (who are usually not technical people) have a hard time not changing their designs and requirements every other week (usually for good reasons) making API design challenging and creating tension between the teams.

From this perspective, the API is the most important deliverable for our team and it should communicate the design goal while being very explicit about how it works, why it works the way it does, and how to implement it client side. This is a very good place where we can improve communication by making sure that we focus on making clear, well designed, well documented, flexible APIs.

 

Scalability, performance

On the server side, the APIs need to perform and scale to handles tends of thousands of concurrent requests. Web developers often rely on aggressive HTTP caching but in our case, the web client (our SDK) has a limited amount of memory available and 90% of the requests are user specific (can’t use full page HTTP cache) and a lot of these are POST/DELETE requests (can’t be cached). That means that, to scale, we have to focus on what most developers don’t often have to worry too much about: all the small details which, put together with a high load, end up drastically affecting your performance.

While Ruby is a great language, a lot of the libraries and frameworks are not optimized for performance, at least not the type of performance needed for our use case. However, the good news is that this is easily fixable and many alternatives exist (lots of async, non-blocking drivers for i.e). When obsessed with performance, you quickly learn to properly load test, profile, and monitor your code to find the bottlenecks and the places where you should focus your attention. The big, unique challenge though, is that a console game will more than likely see its peak traffic in the first few weeks, not really giving the chance to the online team to iteratively handle the prod issues. The only solution is to do everything possible before going live to ensure that the system will perform as expected. Of course if we were to write the same services in a more performant language, we would need to spend less time optimizing. But we are gaining so much flexibility by using a higher level programming language that, in my mind, the trade off is totally worth it (plus you still need to spend a lot of time optimizing your code path, even if your code is written in a very fast language).

 

Deadlines, requirement changes

That’s just part of the way the industry works. Unless you work for Blizzard and you can afford to spend a crazy amount of time and money on the development of a title; you will have to deal with sliding deadlines, requirement changes, scope changes etc… The only way I know how to protect myself from such things is to plan for the worst. Being a non-idealistic (read pessimistic) person helps a lot. When you design your software, make sure your design is sound but flexible enough to handle any major change that you know could happen at any time. Pick your battles and make sure your assumptions are properly thought through, communicated and documented so others understand and accept them. In a nutshell, this is a problem we can’t avoid, so you need to embrace it.

 

Limited reusability

This topic has a lot to do with the previous paragraph. Because scopes can change often and because the deadlines are often crazy, a lot of the time, engineers don’t take the time to think about reusability. They slap some code together, pray to the lords of Kobol and hope that they won’t have to look at their code ever again (I’m guilty of having done that too). The result is a lot of throw away code. This is actually quite frequent and normal in our industry. But it doesn’t mean that it the right thing to do! The assumption/myth is that each game is different and therefore two games can’t be using the same tech solution. My take on that is that it’s partly true. But some components are the same for 80% of the games I work on. So why not design them well and reuse the common parts? (A lot of games share the same engines, such as Unreal for example, and there is no reason why we can’t build a core online engine extended for each title)

 

My approach

When I joined Sony, I had limited experience with the console video game industry and my experience was not even related to online gaming. So even though I had (strong) opinions (and was often quite (perhaps even too) vocal about them), I did my best to improve existing components and work with the existing system. During that time, the team shipped 4 AAA titles on the existing system. As we were going through the game cycles, I did my best to understand the problem domain, the reasons behind some of the design decisions and finally I looked at what could be done differently to improve our business value. After releasing a title with some serious technical difficulties, I spent some time analyzing and listing the problems we had and their root causes. I asked our senior director for a mission statement and we got the team together to define the desiderata/objectives of our base technology. Here is what we came up with:

  1. Stability
  2. Performance / Scalability
  3. Encapsulation / Modularity
  4. Documentation
  5. Conventions
  6. Reusability / Maintainability

These objectives are meant to help us objectively evaluate two options. The legacy solution was based on Rails, or more accurately: Rails was used in the legacy solution. Rails had been hacked in so many different ways that it was really hard to update anything without breaking random parts of the framework. The way to do basic things kept being changed, there was no consistent design, no entry points, no conventions and each new game would duplicate the source code of the previously released game and make the game specific changes. Patches were hard to back port and older titles were often not patched up. The performance was atrocious under load, mainly due to hacked-up Rails not performing well. (Rails was allocating so many objects per request that the GC was taking a huge amount of the request cycles, the default XML builder also created a ton load of objects etc…) This was your typical broken windows scenario. Engineers were getting frustrated, motivation was fainting, bugs were piling up and nobody felt ownership over the tech.

Now, to be fair, it is important to explain that the legacy system was hacked up together due to lack of time, lack of resources and a lot of pressure to release something ASAP. So, while the end result sounds bad, the context is very important to note. This is quite common in software engineering and when you get there, the goal is not to point fingers but to identify the good and the bad parts of the original solution. You then use this info to decide what to do: fix the existing system or rewrite, porting the good parts.

Our report also came up with a plan. A plan to redesign our technology stack to match the desiderata previously mentioned. To put it simply, the plan was to write a new custom web framework focusing on stability, performance, modularity and documentation. Now, there are frameworks out there which already do that or value these principles. But none of them focus on web APIs and none of them are specific to game development. Finally, the other issue was that we had invested a lot of time on game specific code and we couldn’t throw away all that work, so the new framework had to support a good chunk of legacy code but had to make it run much faster.

Design choices

Low conversion cost

Using node.jscoffee script/Scala/whatever new fancy tech was not really an option. We have a bunch of games out there which are running on the old system and some of these games will have a sequel or a game close enough that we could reuse part of the work. We don’t want to have to rewrite the existing code. I therefore made sure that we could reuse 90% of the business logic by adding an abstraction layer doing the heavy lifting at boot time and therefore not affecting the runtime performance. Simple conversion scripts were also written to import the core of the existing code over.

Lessons learned: It is very tempting to just redo everything and start from scratch. However, the business logic implementation wasn’t the main cause of our problems. Even though I wish we could have redesigned that piece of the puzzle, it didn’t make sense from a business perspective. A lot of thought had to be put into how to obtain the expected performance level while keeping the optional model/controller/view combos. By having full control of the “web engine”, we managed to isolate things properly without breaking the old paradigms. We also got rid of a lot of assumptions allowing us to design new titles a bit differently while being backward compatible and have our code run dramatically faster.

Web API centric

This is probably the most important design element. If I had to summarize what our system does in just a few words, I would say: a game web API. Of course, it’s much more than that. We have admin interfaces, producer dashboards, community websites, lobbies, p2p, BI reports, async processing jobs etc… But at the end of the day, the only one piece you can’t remove is the game web API. So I really wanted the design to focus on that aspect. When a developer starts implementing a new online game feature, I want him/her to think about the API. But I also want this API to be extremely well documented so the developer working client-side understands the purpose of the API, how to use it, and what the expected response is right away. I also wanted to be able to automatically test our APIs at a very basic level so we could validate that there are discrepancies between what the client expects and what the server provides. To do that, I created a standalone API DSL with everything needed to describe your API but without any implementation details whatsoever. The API DSL lets the developer define a route (url), the HTTP verb expected, if the request should be authenticated or not, SSL or not, the param rules, default values and finally a response description (which was quite a controversial choice). All of these settings can be documented by the developer. This standalone DSL can then be consumed by different tools. For instance we have a tool extracting all the info into nicely formatted HTML doc for the game client developers. This tool doesn’t need to load the framework to just render the documentation. We also use this description at boot time to compile the validation rules and routes, allowing for a much faster request dispatch. And we also use these API description to generate some low level data for the client. Finally, we used the service description DSL to help create mocked service responses allowing the client team to test service designs without having to wait for the implementation streamlining the process.

Lessons learned: We had a lot of internal discussions about the need to define the response within the service description. Some argued that it’s a duplication since we already had a view and we could parse that to get most of what we needed (which is what the old system was doing). We ended up going with the response description DSL for a few critical reasons: testing and implementation simplicity. Testing: we need to have an API expectation reference and to keep this reference sane so we can see if something is changed. If we were to magically parse the response, we couldn’t test the view part of the code against a frame of reference. Implementation simplicity: magically parsing a view template is more tricky that it sounds, you would need to render the template with the right data to make it work properly. Furthermore, you can’t document a response easily in the view, and if you do, you arguably break the separation of concern between the description and the implementation. Finally, generated documentation isn’t enough and that’s why we decided to write English documentation, some being close to the code and some being just good old documentation explaining things outside of the code context.

Modularity

In order to make our code reusable we had to isolate each component and limit the dependencies. We wrote a very simple extension layer allowing each extension to registers itself once detected. The extension interface exposes the path of the extension, its type, models, services, controllers, migrations, seed data, dependencies etc.. Each extension is contained in a folder. (The extension location doesn’t matter much but as part of the framework boot sequence, we check a few default places.) The second step of the process is to check a manifest/config file that is specific to each title. The manifest file lists the extensions that should be activated for the title. The framework then activates the marked extensions and has access to libs, models, views, migrations, seed data and of course to load services (DSL mentioned earlier) etc…

Even though we designed the core extensions the best we could, there are cases where some titles will need to extend these extensions. To do that, we added a bunch of hooks that could be implemented on the title side if needed (Ruby makes that super easy and clean to do!). A good example of that is the login sequence or the player data.

Lessons learned: The challenge with modularity is to keep things simple and highly performing yet flexible. A key element to manage that is to stay as consistent as possible. Don’t implement hooks three different ways, try to keep method signatures consistent, keep it simple and organized.

 

Conclusion

It’s a bit early to say if this rewrite is a success or not and there are still lots of optimizations and technology improvements we are looking forward to doing. Only time will give us enough retrospect to evaluate our work. But because we defined the business value (mission statement) and the technical objectives, it is safe to say that the new framework meets the expectations quite well. On an early benchmark we noted a 10X speed improvement and that’s before drilling into the performance optimizations such as making all the calls non-blocking, using better connection pools, cache write through layer… However, there is still one thing that we will have to monitor: how much business value will this framework generate. And I guess that’s where we failed to define an agreed upon evaluation grid. I presume that if our developers spend more time designing and implementing APIs and less time debugging that could be considered business value. If we spend less time maintaining or fighting with the game engine, that would also be a win. Finally, if the player experience is improved we will be able to definitely say that we made the right choice.

To conclude, I’d like to highlight my main short coming: I failed to define metrics that would help us evaluate the real business value added to our products. What I consider a technical success might not be a business success. How do you, in your own domain, find ways to define clear and objective metrics?

, ,

No Comments

Ruby concurrency explained

Concurrency is certainly not a new problem but it’s getting more and more attention as machines start having more than 1 core, that web traffic increases drastically and that some new technologies show up saying that they are better because they handle concurrency better.
If that helps, think of concurrency as multitasking. When people say that they want concurrency, they say that they want their code to do multiple different things at the same time. When you are on your computer, you don’t expect to have to choose between browsing the web and listening to some music. You more than likely want to run both concurrently. It’s the same thing with your code, if you are running a webserver, you probably don’t want it to only process one request at a time.
The aim of this article is to explain as simply as possible the concept of concurrency in Ruby, the reason why it’s a complicated topic and finally the different solutions to achieve concurrency.

First off, if you are not really familiar with concurrency, take a minute to read the wikipedia article on the topic which is a great recap on the subject. But now, you should have noticed that my above example was more about parallel programming than concurrency, but we’ll come back to that in a minute.

The real question at the heart of the quest for concurrency is: “how to increase code throughput”.

We want our code to perform better, and we want it to do more in less time. Let’s take two simple and concrete examples to illustrate concurrency. First, let’s pretend you are writing a twitter client, you probably want to let the user scroll his/her tweets while the latest updates are  being fetched. In other words, you don’t want to block the main loop and interrupt the user interaction while your code is waiting for a response from the Twitter API. To do that, a common solution is to use multiple threads. Threads are basically processes that run in the same memory context. We would be using one thread for the main event loop and another thread to process the remote API request. Both threads share the same memory context so once the Twitter API thread is done fetching the data it can update the display. Thankfully, this is usually transparently handled by asynchronous APIs (provided by the OS or the programming language std lib) which avoid blocking the main thread.

The second example is a webserver. Let’s say you want to run a Rails application. Because you are awesome, you expect to see a lot of traffic. Probably more than 1 QPS (query/request per second). You benchmarked your application and you know that the average response time is approximately 100ms. Your Rails app can therefore handle 10QPS using a single process (you can do 10 queries at 100ms in a second).

But what happens if your application gets more than 10 requests per second? Well, it’s simple, the requests will be backed up and will take longer until some start timing out. This is why you want to improve your concurrency. There are different ways to do that, a lot of people feel really strong about these different solutions but they often forget to explain why they dislike one solution or prefer one over the other. You might have heard people conclusions which are often one of these: Rails can’t scale, you only get concurrency with JRuby, threads suck, the only way to concurrency is via threads, we should switch to Erlang/Node.js/Scala, use fibers and you will be fine, add more machines, forking > threading.  Depending on who said what and how often you heard it on twitter, conferences, blog posts, you might start believing what others are saying. But do you really understand why people are saying that and are you sure they are right?

The truth is that this is a complicated matter. The good news is that it’s not THAT complicated!

The thing to keep in mind is that the concurrency models are often defined by the programming language you use. In the case of Java, threading is the usual solution, if you want your Java app to be more concurrent, just run every single request in its own thread and you will be fine (kinda). In PHP, you simply don’t have threads, instead you will start a new process per request. Both have pros and cons, the advantage of the Java threaded approach is that the memory is shared between the threads so you are saving in memory (and startup time), each thread can easily talk to each other via the shared memory. The advantage of PHP is that you don’t have to worry about locks, deadlocks, threadsafe code and all that mess hidden behind threads. Described like that it looks pretty simple, but you might wonder why PHP doesn’t have threads and why Java developers don’t prefer starting multiple processes. The answer is probably related to the language design decisions. PHP is a language designed for the web and for short lived processes. PHP code should be fast to load and not use too much memory. Java code is slower to boot and to warm up, it usually uses quite a lot of memory. Finally, Java is a general purpose programming language not designed primarily for the internet. Others programming languages like Erlang and Scala use a third approach: the actor model. The actor model is somewhat a bit of a mix of both solutions, the difference is that actors are a like threads which don’t share the same memory context. Communication between actors is done via exchanged messages ensuring that each actor handles its own state and therefore avoiding corrupt data (two threads can modify the same data at the same time, but an actor can’t receive two messages at the exact same time). We’ll talk about that design pattern later on, so don’t worry if you are confused.

What about Ruby? Should Ruby developers use threads, multiple processes, actors, something else? The answer is: yes!

Threads

Since version 1.9, Ruby has native threads (before that green threads were used). So in theory, if we would like to, we should be able to use threads everywhere like most Java developers do. Well, that’s almost true, the problem is that Ruby, like Python uses a Global Interpreter Lock (aka GIL). This GIL is a locking mechanism that is meant to protect your data integrity. The GIL only allows data to be modified by one thread at time and therefore doesn’t let threads corrupt data but also it doesn’t allow them to truly run concurrently. That is why some people say that Ruby and Python are not capable of (true) concurrency.

Global Interpreter Lock by Matt Aimonetti

However these people often don’t mention that the GIL makes single threaded programs faster, that multi-threaded programs are much easier to develop since the data structures are safe and finally that a lot of C extensions are not thread safe and without the GIL, these C extensions don’t behave properly. These arguments don’t convince everyone and that’s why you will hear some people say you should look at another Ruby implementation without a GIL, such as JRuby, Rubinius (hydra branch) or MacRuby (Rubinius & MacRuby also offer other concurrency approaches). If you are using an implementation without a GIL, then using threads in Ruby has exactly the same pros/cons than doing so in Java. However, it means that now you have to deal with the nightmare of threads: making sure your data is safe, doesn’t deadlock, check that your code, your libs, plugins and gems are thread safe. Also, running too many threads might affect the performance because your OS doesn’t have enough resources to allocate and it ends up spending its time context switching. It’s up to you to see if it’s worth it for your project.

Multiple processes & forking

That’s the most commonly used solution to gain concurrency when using Ruby and Python. Because the default language implementation isn’t capable of true concurrency or because you want to avoid the challenges of thread programming, you might want to just start more processes. That’s really easy as long as you don’t want to share states between running processes. If you wanted to do so, you would need to use DRb, a message bus like RabbitMQ, or a shared data store like memcached or a DB. The caveat is that you now need to use a LOT more memory. If want to run 5 Rails processes and your app uses 100Mb you will now need 500Mb, ouch that’s a lot of memory! That is exactly what happens when you use a Rails webserver like Mongrel. Now some other servers like Passenger and Unicorn found a workaround, they rely on unix forking. The advantage of forking in an unix environment implementing the copy-on-write semantics is that we create a new copy of the main process but they both “share” the same physical memory. However, each process can modify its own memory without affecting the other processes. So now, Passenger can load your 100Mb Rails app in a process, then fork this process 5 times and the total footprint will be just a bit more than 100Mb and you can now handle 5X more concurrent requests. Note that if you are allocating memory in your request processing code (read controller/view) your overall memory will grow but you can still run many more processes before running out of memory. This approach is appealing because really easy and pretty safe. If a forked process acts up or leaks memory, just destroy it and create a new fork from the master process. Note that this approach is also used in Resque, the async job processing solution by GitHub.

This solution works well if you want to duplicate a full process like a webserver, however it gets less interesting when you just want to execute some code “in the background”. Resque took this approach because by nature async jobs can yield weird results, leak memory or hang. Dealing with forks allows for an external control of the processes and the cost of the fork isn’t a big deal since we are already in an async processing approach.

Screenshot of GitHub's repository forking

Actors/Fibers

Earlier we talked a bit about the actor model. Since Ruby 1.9, developers now have access to a new type of “lightweight” threads called Fibers. Fibers are not actors and Ruby doesn’t have a native Actor model implementation but some people wrote some actor libs on top of fibers. A fiber is like a simplified thread which isn’t scheduled by the VM but by the programmer. Fibers are like blocks which can be paused and resumed from the outside of from within themselves. Fibers are faster and use less memory than threads as demonstrated in this blog post. However, because of the GIL, you still cannot truly run more than one concurrent fiber by thread and if you want to use multiple CPU cores, you will need to run fibers within more than one thread. So how do fibers help with concurrency? The answer is that they are part of a bigger solution. Fiber allow developers to manually control the scheduling of “concurrent” code but also to have the code within the fiber to auto schedule itself. That’s pretty big because now you can wrap an incoming web request in its own fiber and tell it to send a response back when it’s done doing its things. In the meantime, you can move on the to next incoming request. Whenever a request within a fiber is done, it will automatically resume itself and be returned. Sounds great right? Well, the only problem is that if you are doing any type of blocking IO in a fiber, the entire thread is blocked and the other fibers aren’t running. Blocking operations are operations like database/memcached queries, http requests… basically things you are probably triggering from your controllers. The good news is that the “only” problem to fix now is to avoid blocking IOs. Let’s see how to do that.

fiber

Non blocking IOs/Reactor pattern.

The reactor pattern is quite simple to understand really. The heavy work of making blocking IO calls is delegated to an external service (reactor) which can receive concurrent requests. The service handler (reactor) is given callback methods to trigger asynchronously based on the type of response received. Let me take a limited analogy to hopefully explain the design better. It’s a bit like if you were asking someone a hard question, the person will take a while to reply but his/her reply will make you decide if you raise a flag or not. You have two options, or you choose to wait for the response and decide to raise the flag based on the response, or your flag logic is already defined and you tell the person what to do based on their answer and move on without having to worry about waiting for the answer. The second approach is exactly what the reactor pattern is. It’s obviously slightly more complicated but the key concept is that it allows your code to define methods/blocks to be called based on the response which will come later on.

Reactor Pattern illustrated in Matt Aimonetti's blog

In the case of a single threaded webserver that’s quite important. When a request comes in and your code makes a DB query, you are blocking any other requests from being processed. To avoid that, we could wrap our request in a fiber, trigger an async DB call and pause the fiber so another request can get processed as we are waiting for the DB. Once the DB query comes back, it wakes up the fiber it was trigger from, which then sends the response back to the client. Technically, the server can still only send one response at a time, but now fibers can run in parallel and don’t block the main tread by doing blocking IOs (since it’s done by the reactor).

This is the approach used by Twisted, EventMachine and Node.js. Ruby developers can use EventMachine or an EventMachine based webserver like Thin as well as EM clients/drivers to make non blocking async calls. Mix that with some Fiber love and you get Ruby concurrency. Be careful though, using Thin, non blocking drivers and Rails in threadsafe mode doesn’t mean you are doing concurrent requests. Thin/EM only use one thread and you need to let it know that it’s ok to handle the next request as we are waiting. This is done by deferring the response and let the reactor know about it.

The obvious problem with this approach is that it forces you to change the way you write code. You now need to set a bunch of callbacks, understand the Fiber syntax, and use deferrable responses, I have to admit that this is kind of a pain. If you look at some Node.js code, you will see that it’s not always an elegant approach. The good news tho, is that this process can be wrapped and your code can be written as it if was processed synchronously while being handled asynchronously under the covers. This is a bit more complex to explain without showing code, so this will be the topic of a future post. But I do believe that things will get much easier soon enough.

Conclusion

High concurrency with Ruby is doable and done by many. However, it could made easier. Ruby 1.9 gave us fibers which allow for a more granular control over the concurrency scheduling, combined with non-blocking IO, high concurrency can be achieved. There is also the easy solution of forking a running process to multiply the processing power. However the real question behind this heated debate is what is the future of the Global Interpreter Lock in Ruby, should we remove it to improve concurrency at the cost of dealing with some new major threading issues, unsafe C extensions, etc..? Alternative Ruby implementers seem to believe so, but at the same time Rails still ships with a default mutex lock only allowing requests to be processed one at a time, the reason given being that a lot of people using Rails don’t write thread safe code and a lot of plugins are not threadsafe. Is the future of concurrency something more like libdispatch/GCD where the threads are handled by the kernel and the developer only deals with a simpler/safer API?

Further reading:

, , ,

33 Comments

The Ruby movement – art & programming

I wrote a guest blog post for Satish Talim over at RubyLearning.org

You can read it there.

No Comments

Discussion with a Java switcher

For the past 6 months, I have had regular discussions with an experienced Java developers who switched to Ruby a couple years ago. Names have been changed to protect the guilty but to help you understand my friend ‘Duke’ better, you need to know that he has been a developer for 10 years and lead many complicated, high traffic projects. He recently released two Ruby on Rails projects and he has been fighting with performance issues and scalability challenges.

Duke is a happy Ruby developer but he sometimes has a hard time understanding why things are done in a certain way in the Ruby community. Here are some extracts from our conversations. My answers are only based on my own experience and limited knowledge. They are probably not shared by the entire  community, feel free to use the comment section if you want to add more or share your own answers.

Threads / Concurrency

Duke: Why does the Ruby community hate threads so much. It seems to be a taboo discussion and the only answer I hear is that threads are hard to deal with and that Ruby does not have a good threading implementation. What’s the deal there? If you want concurrent processing, threads are important!

Me: This is a very good question and I think there are two main reasons why threads and thread safety are not hot topics in the Ruby world. First, look at Ruby’s main implementation itself. If you are using an old version of Ruby (pre Ruby 1.9) you don’t use native threads but green threads mapping to only 1 native thread. Ilya has a great (yet a bit old) blog post explaining the difference, why it matters and also the role and effect of the Global Interpreter Lock (GIL). Also, even though Rubyists like to say that they live in the edge, most of them still use Ruby 1.8 and therefore don’t really see the improvements in Ruby 1.9 nor yet understand the potential of fibers.

The other part of the explanation is that the Rails community never really cared until recently. Yehuda Katz recently wrote a good article on thread safety in Ruby and if you read his post and Zed Shaw’s comment you will understand a bit better the historical background. As a matter of fact, the current version of Rails is not multi-threaded by default and developers interested in handling concurrent requests in one process should turn on this option. Thread safety appeared for the first time in Rails 2.2 but from what I saw, most people still don’t enable this option. There are many reasons for that. First, enabling thread safety disables some Rails features like automatic dependency loading after boot and code reloading. A lot of Rails developers take these two features for granted and don’t understand that they are technically “hacks” to make their lives easier. I do believe a lot of Rails developers don’t understand how threads, thread safety, concurrency, blocking IO and dependencies work. They care about getting their app done and meet their deadlines. They usually use and know Rails without paying too much attention to how Rails extends Ruby. Imagine what would happen if their code wasn’t thread safe and Rails wasn’t not using a global lock by default. Now you see why things are not exactly as you expect and also why some Rubyists are getting excited about new projects like node.js which takes a different approach.

The other thing to keep in mind is that at least 90 to 95% of the Rails apps out there don’t get more than a dozen requests/second (a million requests/day). You can scale that kind of load pretty easily using simple approaches like caching,  optimize your DB queries, load balancing to a couple servers. As a matter of fact, compared to the amount of people using Rails on a daily basis, only a very little amount of people are struggling with performance and scalability like you do. This is not an excuse but that explains why these people don’t care about the things you care about.

Rails is slow

Duke: I don’t understand why Rails developers are not more concerned about the speed/performance penalty induced by Rails.

Me: Again, Rails is fast enough for the large majority of developers out there. As you know, as a developer you have to always make compromises. The Rails team always said that development time is more expensive than servers and therefore the focus is on making development easier, faster and more enjoyable. However to get there, they have to somewhat sacrifice some performance. What can be totally unacceptable for you is totally fine for others and your contribution is always welcome. This is probably the root cause of the things you don’t like in Rails. Rails was built for startups, by startup developers and you don’t fall in this category. People contributing new features and fixes are the people using Rails for what it is designed to do. There is no real ‘Enterprise’ support behind Rails and that might be why you feel the way you feel. Since you find yourself questioning some key Rails conventions and you are struggling with missing features, it looks  to me that you chose the wrong tool for the job since you don’t even use 70% of the Rails features and are dreaming of things such 3 tier architecture. Sinatra might be a better fit for you if you want lower level control, less conventions and less built-in features.

Object allocation / Garbage Collection

Duke: I recently read that Twitter was spending 20% of its request cycles in the GC, am I the only finding that concerning?

Me: Most people don’t realize how the GC works and what it means to allocate objects since Ruby does that automatically. But at the same time, most of these people don’t really see the affect of the Garbage Collection since they don’t have that much traffic or they scale in ways that just skips their Ruby stack entirely. (Or they just blame Ruby for being slow)

If you are app deals with mainly reads/GET requests, using HTTP caching (Rails has that built-in) and something like Varnish/Rack-cache will dramatically reduce the load on your server apps. Others don’t investigate their issues and just add more servers. As mentioned in a previous post, some libraries like Builder are allocating LOTS more objects than others (Nokogiri), use the existing debugging tools to see where your object allocations occur and try to fix/workaround these. In other words, Ruby’s GC isn’t great but by ignoring its limitations, we made things even worse. My guess is that the GC is going to improve (other implementations already have better GCs) and that people will realize that Ruby is not magic and critical elements need to be improved.

Tools

Duke: I really have a hard time finding good tools to help scale my apps better and understand where I should optimize my code.

Me: It is true that we area lacking tools but things are changing. On top of the built-in tools like ObjectSpace, GC::Profiler, people interested in performance/debugging are working to provide the Ruby community with their expertise, look at memprof and ruby-debug for instance. Of course you can also use tools such as Ruby-prof, Kcachegrind, Valgrind and GDB. (1.9.2 was scheduled to have DTrace support but I did not check yet). Maybe you should be more explicit about what tools you miss and how we could solve the gap.

ActiveRecord

Duke: ActiveRecord doesn’t do what I need. How come there is no native support for master/slave DBs, sharding, DB view support is buggy,  suggested indexes on queries is not built-in and errors are not handled properly (server is gone, out of sync etc..)?

Me: You don’t have to use ActiveRecord, you could use any ORM such as Sequel, DataMapper or your own. But to answer your question, I think that AR doesn’t do everything you want because nobody contributed these features to the project and the people maintaining ActiveRecord don’t have the need for these features.

What can we do?

We, as a community, need to realize that we have to learn from other communities and other programming languages, this kind of humorous graph is unfortunately not too far from reality.

Bringing your expertise and knowledge to the Ruby community is important. Looking further than just our own little will push us to improve and fulfill the gaps. Let the community know what tools you are missing, the good practices you think we should be following etc…

Take for instance Node.js, it’s a port of Ruby’s EventMachine / Python’s twisted. There is no reasons why the Ruby or Python versions could not do what the Javascript version does. However people are getting excited and are jumping ship. What do we do about that? One way would be to identify what makes node more attractive than EventMachine and what needs to be done so we can offer what people are looking for. I asked this question a few weeks ago and the response was that a lot of the Ruby libraries are blocking and having to check is too bothersome. Maybe that’s something that the community should be addressing. Node doesn’t have that many libraries and people will have to write them, in the mean time we can make our libs non-blocking. Also, let’s not forget that this is not a competition and people should choose the best tool for their projects.

Finally, things don’t change overnight, as more people encounter the issues you are facing, as we learn from others, part of the community will focus on the problems you are seeing and things will get better. Hopefully, you will also be able to contribute and influence the community to build an even better Ruby world.

,

14 Comments

Ruby object allocation & why you should care

Recently I was tasked with finding how to optimize a web application with heavy traffic. The application (a Rails 2.3.x app) gets about 3 million requests per hour and most of these requests cannot really be easily cached so they go through the entire stack.

This is probably not the case of most web apps out there. None the less, my findings my help you understand Ruby better and maybe think differently about memory management.

This is certainly not an advanced GC blog post, I will try to keep it as simple as possible. My goal is to show you how Ruby memory allocation works and why it can affect your app performance and finally, how can you avoid to allocate to many objects.

Ruby memory management.

Rubyists are quite lucky since they don’t have to manage the memory themselves. Because developers are lazy and Matz developed his language for people and not machine, memory is managed “magically”. Programming should be fun and managing memory isn’t really considered fun (ask video game developers or iOS programmers ;) ).

So in Ruby, the magical memory management is done by a Garbage Collector. The GC’s job is to run and free objects that were previously allocated but not used anymore. Without a GC we would saturate the memory available on the host running the program or would have to deallocate the memory manually. Ruby’s GC uses a conservative, stop the world, mark-and-sweep collection mechanism.  More simply, the garbage collection runs when the allocated memory for the process is maxed out. The GC runs and blocks all code from being executed and will free unused objects so new objects can be allocated.

Joe Damato did a great talk on that matter during last RailsConf

Garbage Collection and the Ruby Heap

The problem is that Ruby’s GC was not designed to support hundred thousand objects allocation per second. Unfortunately, that’s exactly what frameworks like Ruby on Rails do, and you might contribute to the problem too without even knowing it.

Does it really matter?

I believe it does. In my case improving the object allocation means much better response time, less servers, less support and less headaches. You might think that servers are cheaper than developers. But more servers mean more developer time spent fixing bugs and more IT support. That’s why I think, memory management is something Ruby developers should be aware of and should take in consideration, especially the ones writing frameworks, libraries or shared code.

I am using Ruby 1.9 so I could not profile my Rails 2.x app using memprof, instead I wrote a simple and basic middleware that keeps track of the memory allocation/deallocation and GC cycles during a web request (Ruby1.9 only). One of my simple Rails2 actions (1 DB call, simple view) is allocating 170,000 objects per requests. Yes, you read right: 170k objects every single request. At 3 million requests/hour, you can imagine that we are spending a LOT of time waiting for the GC. This is obviously not 100% Rails fault as I am sure our code is contributing to the problem. I heard from the memprof guys that Rails was allocating 40k objects. I decided to check Rails3.

After warming up, a basic Rails3 ‘hello world’ app clocks at about 8,500 objects allocated per request, forcing the GC to run more or less every 6 requests. On my machine (mac pro) the GC takes about 20ms to free the objects. A Rack ‘hello world’ app clocks at 7 objects per request and a Sinatra app at 181 objects. Of course you can’t really compare these different libraries/frameworks but that gives you an idea of the price you pay to get more features.

One thing to remember is that the more objects you allocate, the more time you “lose” at code execution. For more developers, it probably doesn’t matter much, but if you should still understand that concept especially if you decide to contribute to the OSS community and offer patches, libraries, plugins etc…

What can I do?

Be aware that you are allocating  objects, for instance something as simple as 100.times{ ‘foo’ } allocates 100 string objects (strings are mutable and therefore each version requires its own memory allocation).

Make sure to evaluate the libraries you use, for instance switching a Sinatra XML rendering action from Builder to Nokogiri XML Builder saved us about 12k object allocations (Thanks Aaron Patterson). Make sure that if you are using a library allocating a LOT of objects, that other alternatives are not available and your choice is worth paying the GC cost. (you might not have a lot of requests/s or might not care for a few dozen ms per requests). You can use memprof or one of the many existing tools to check on the GC cycles using load tests or in dev mode. Also, be careful to analyze the data properly and to not only look at the first request. Someone sent me this memory dump from a Rails3 ‘hello world’ with Ruby 1.8.7 and it shows that Rails is using 331973 objects.  While this is totally true, it doesn’t mean that 330k objects are created per request. Instead that means that 330k objects are currently in memory. Rubygems loading already allocate a lot of objects, Rails even more but these objects won’t be GC’d and don’t matter as much as the ones allocated every single request. The total amount of memory used by a Ruby process isn’t that important, however the fluctuation forcing the GC to run often is. This is why my middleware only cares about the allocation change during a request. (The GC should still traverse the entire memory so, smaller is better)

The more object allocation you do at runtime/per request, the more the GC will need to run, the slower your code will be. So this is not a question of memory space, but more of performance. If your framework/ORM/library/plugin allocates too many objects per request maybe you should start by reporting the problem and if you can, offer some patches.

Here are some hints about memory allocation:

Creating a hash object really allocates more than an object, for instance {‘joe’ => ‘male’, ‘jane’ => ‘female’} doesn’t allocate 1 object but 7. (one hash, 4 strings + 2 key strings) If you can use symbol keys as they won’t be garbage collected. However because they won’t be GC’d you want to make sure to not use totally dynamic keys like converting the username to a symbol, otherwise you will ‘leak’ memory.

Looking at a GC cycle in the Rails3 hello world example shows what objects get deallocated:

GC run, previous cycle was 6 requests ago.

GC 203 invokes. (amount of cycles since the program was started)
Index   1

Invoke Time(sec)   25.268

Use Size(byte)   4702440

Total Size(byte)   7307264

Total Object   182414

GC Time(ms) 22.35600000000204090611

## 56322 freed objects. ##
[78%] 44334 freed strings.
[7%] 4325 freed arrays.
[0%] 504 freed bignums.
[1%] 613 freed hashes.
[0%] 289 freed objects.
[5%] 3030 freed parser nodes (eval usage).

I did not list all the object types but it’s pretty obvious that the main issue in the case of Rails is string allocation. To a certain extend the allocated arrays and the runtime use of eval are not helping either. (what is being eval’d at runtime anyway?)

If you use the same string in various place of you code, you can “cache” them using a local var, instance variable, class variable or constant. Sometimes you can just replaced them by a symbol and save a few allocations/deallocations per request. Whatever you do tho, make sure there is a real need for it. My rule of thumb is that if some code gets exercised by 80% of the requests, it should be really optimized and avoid extra allocations so the GC won’t slow us down.

What about a better GC?

That’s the easy answer. When I mentioned this problem with Rails, a lot of people told me that I should use JRuby or Rubinius because their GC were much better. Unfortunately, that’s not that simple and choosing an alternative implementation requires much further research and tests.

But what annoys me with this solution is that using it is not solving the issue, it’s just working around it. Yes, Ruby’s GC isn’t that great but that’s the not the key issue, the key issue is that some libraries/frameworks allocate way too many objects and that nobody seems to care (or to even know it). I know that the Ruby Core Team is working on some optimizations and I am sure Ruby will eventually get an improved GC. In the meantime, it’s easy to blame Matz, Koichi and the rest of the core team but again, it’s ignoring that the root cause, totally uncontrolled memory allocation.

Maybe it’s time for us, Rubyists, to think a bit more about our memory usage.

15 Comments