Archive for category Software Design

Rethinking web API design

I wrote an article talking about the need to rethink the way we approach web service/API design, you can read it there.

Matt Aimonetti - Rethinking web service development

No Comments

Building and implementing a Single Sign-On solution

Most modern web applications start as a monolithic code base and, as complexity increases, the once small app gets split apart into many “modules”. In other cases, engineers opt for a SOA design approach from the beginning. One way or another, we start running multiple separate applications that need to interact seamlessly. My goal will be to describe some of the high-level challenges and solutions found in implementing a Single-Sign-On service.

Authentication vs Authorization

I wish these two words didn’t share the same root because it surely confuses a lot of people. My most frequently-discussed example is OAuth. Every time I start talking about implementing a centralized/unified authentication system, someone jumps in and suggests that we use OAuth. The challenge is that OAuth is an authorization system, not an authentication system.

It’s tricky, because you might actually be “authenticating” yourself to website X using OAuth. What you are really doing is allowing website X to use your information stored by the OAuth provider. It is true that OAuth offers a pseudo-authentication approach via its provider but that is not the main goal of OAuth: the Auth in OAuth stands for Authorization, not Authentication.

Here is how we could briefly describe each role:

  • Authentication: recognizes who you are.
  • Authorization: know what you are allowed to do, or what you allow others to do.

If you are feel stuck in your design and something seems wrong, ask yourself if you might be confused by the 2 auth words. This article will only focus on authentication.

A Common Scenario

SSO diagram with 3 top applications connecting to an authorization service.

This is probably the most common structure, though I made it slightly more complex by drawing the three main apps in different programming languages. We have three web applications running on different subdomains and sharing account data via a centralized authentication service.

Goals:

  • Keep authentication and basic account data isolated.
  • Allow users to stay logged in while browsing different apps.

Implementing such a system should be easy. That said, if you migrate an existing app to an architecture like that, you will spend 80% of your time decoupling your legacy code from authentication and wondering what data should be centralized and what should be distributed. Unfortunately, I can’t tell you what to do there since this is very domain specific. Instead, let’s see how to do the “easy part.”

Centralizing and Isolating Shared Account Data

At this point, you more than likely have each of your apps talk directly to shared database tables that contain user account data. The first step is to migrate away from doing that. We need a single interface that is the only entry point to create or update shared account data. Some of the data we have in the database might be app specific and therefore should stay within each app, anything that is shared across apps should be moved behind the new interface.

Often your centralized authentication system will store the following information:

  • ID
  • first name
  • last name
  • login/nickname
  • email
  • hashed password
  • salt
  • creation timestamp
  • update timestamp
  • account state (verified, disabled …)

Do not duplicate this data in each app, instead have each app rely on the account ID to query data that is specific to a given account in the app. Technically that means that instead of using SQL joins, you will query your database using the ID as part of the condition.

My suggestion is to do things slowly but surely. Migrate your database schema piece by piece assuring that everything works fine. Once the other pieces will be in place, you can migrate one code API a time until your entire code base is moved over. You might want to change your DB credentials to only have read access, then no access at all.

Login workflow

Each of our apps already has a way for users to login. We don’t want to change the user experience, instead we want to make a transparent modification so the authentication check is done in a centralized way instead of a local way. To do that, the easiest way is to keep your current login forms but instead of POSTing them to your local apps, we’ll POST them to a centralized authentication API. (SSL is strongly recommended)

diagram showing the login workflow

As shown above, the login form now submits to an endpoint in the authentication application. The form will more than likely include a login or email and a clear text password as well as a hidden callback/redirect url so that the authentication API can redirect the user’s browser to the original app. For security reasons, you might want to white list the domains you allow your authentication app to redirect to.

Internally, the Authentication app will validate the identifier (email or login) using a hashed version of the clear password against the matching record in the account data. If the verification is successful, a token will be generated containing some user data (for instance: id, first name, last name, email, created date, authentication timestamp). If the verification failed, the token isn’t generated. Finally the user’s browser is redirected to the callback/redirect URL provided in the request with the token being passed.

You might want to safely encrypt the data in a way that allows the clients to verify and trust that the token comes from a trusted source. A great solution for that would be to use RSA encryption with the public key available in all your client apps but the private key only available on the auth server(s). Other strong encryption solutions would also work. For instance, another appropriate approach would be to add a signature to the params sent back. This way the clients could check the authenticity of the params. HMAC or DSA signature are great for that but in some cases, you don’t want people to see the content of the data you send back. That’s especially true if you are sending back a ‘mobile’ token for instance. But that’s a different story. What’s important to consider is that we need a way to ensure that the data sent back to the client can’t be tampered with. You might also make sure you prevent replay attacks.

On the other side, the application receives a GET request with a token param. If the token is empty or can’t be decrypted, authentication failed. At that point, we need to show the user the login page again and let him/her try again. If on the other hand, the token can be decrypted, the content should be saved in the session so future requests can reuse the data.

We described the authentication workflow, but if a user logins in application X, (s)he won’t be logged-in in application Y or Z. The trick here, is to set a top level domain cookie that can be seen by all applications running on subdomains. Certainly, this solution only works for apps being on the same domain, but we’ll see later how to handle apps on different domains.

The cookie doesn’t need to contain a lot of data, its value can contain the account id, a timestamp (to know when authentication happened and a trusted signature) and a signature. The signature is critical here since this cookie will allow users to be automatically logged in other sites. I’d recommend the  HMAC or DSA encryptions to generate the signature. The DSA encryption, very much like the RSA encryption is an asymmetrical encryption relying on a public/private key. This approach offers more security than having something based a shared secret like HMAC does. But that’s really up to you.

Finally, we need to set a filter in your application. This auto-login filter will check the presence of an auth cookie on the top level domain and the absence of local session. If that’s the case, a session is automatically created using the user id from the cookie value after the cookie integrity is verified. We could also share the session between all our apps, but in most cases, the data stored by each app is very specific and it’s safer/cleaner to keep the sessions isolated. The integration with an app running on a different service will also be easier if the sessions are isolated.

 

Registration

For registration, as for login, we can take one of two approaches: point the user’s browser to the auth API or make S2S (server to server) calls from within our apps to the Authentication app. POSTing a form directly to the API is a great way to reduce duplicated logic and traffic on each client app so I’ll demonstrate this approach.

As you can see, the approach is the same we used to login. The difference is that instead of returning a token, we just return some params (id, email and potential errors). The redirect/callback url will also obviously be different than for login. You could decide to encrypt the data you send back, but in this scenario, what I would do is set an auth cookie at the .domain.com level when the account is created so the “client” application can auto-login the user. The information sent back in the redirect is used to re-display the register form with the error information and the email entered by the user.

At this point, our implementation is almost complete. We can create an account and login using the defined credentials. Users can switch from one app to another without having to re login because we are using a shared signed cookie that can only be created by the authentication app and can be verified by all “client” apps. Our code is simple, safe and efficient.

Updating or deleting an account

The next thing we will need is to update or delete an account. In this case, this is something that needs to be done between a “client” app and the authentication/accounts app. We’ll make S2S (server to server) calls. To ensure the security of our apps and to offer a nice way to log requests, API tokens/keys will be used by each client to communicate with the authentication/accounts app. The API key can be passed using a X-header so this concern stays out of the request params and our code can process separately the authentication via X-header and the actual service implementation. S2S services should have a filter verifying and logging the API requests based on the key sent with the request. The rest is straight forward.

Using different domains

Until now, we assumed all our apps were on the same top domain. In reality, you will often find yourself with apps on different domains. This means that you can’t use the shared signed cookie approach anymore. However, there is a simple trick that will allow you to avoid requiring your users to re-login as they switch apps.

 

The trick consists, when a local session isn’t present, of using an iframe in the application using the different domain. The iframe loads a page from the authentication/accounts app which verifies that a valid cookie was set on the main top domain. If that is the case, we can tell the application that the user is already globally logged in and we can tell the iframe host to redirect to an application end point passing an auth token the same way we did during the authentication. The app would then create a session and redirect the user back to where (s)he started. The next requests will see the local session and this process will be ignored.

If the authentication application doesn’t find a signed cookie, the iframe can display a login form or redirect the iframe host to a login form depending on the required behavior.

Something to keep in mind when using multiple apps and domains is that you need to keep the shared cookies/sessions in sync, meaning that if you log out from an app, you need to also delete the auth cookie to ensure that users are globally logged out. (It also means that you might always want to use an iframe to check the login status and auto-logoff users).

 

Mobile clients

Another part of implementing a SSO solution is to handle mobile clients. Mobile clients need to be able to register/login and update accounts. However, unlike S2S service clients, mobile clients should only allow calls to modify data on the behalf of a given user. To do that, I recommend providing opaque mobile tokens during the login process. This token can then be sent with each request in a X-header so the service can authenticate the user making the request. Again, SSL is strongly recommended.

In this approach, we don’t use a cookie and we actually don’t need a SSO solution, but an unified authentication system.

 

Writing web services

Our Authentication/Accounts application turns out to be a pure web API app.

We also have 3 sets of APIs:

  • Public APIs: can be accessed from anywhere, no authentication required
  • S2S APIs: authenticated via API keys and only available to trusted clients
  • Mobile APIs: authenticated via a mobile token and limited in scope.

We don’t need dynamic HTML views, just simple web service related code. While this is a little bit off topic, I’d like to take a minute to show you how I personally like writing web service applications.

Something that I care a lot about when I implement web APIs is to validate incoming params. This is an opinionated approach that I picked up while at Sony and that I think should be used every time you implement a web API. As a matter of fact, I wrote a Ruby DSL library (Weasel Diesel) allowing you describe a given service, its incoming params, and the expected output. This DSL is hooked into a web backend so you can implement services using a web engine such as Sinatra or maybe Rails3. Based on the DSL usage, incoming parameters are be verified before being processed. The other advantage is that you can generate documentation based on the API description as well as automated tests.

You might be familiar with Grape, another DSL for web services. Besides the obvious style difference Weasel Diesel offers the following advantages:

  • input validation/sanitization
  • service isolation
  • generated documentation
  • contract based design
Here is a hello world webservice being implemented using Weasel Diesel and Sinatra:
describe_service "hello_world" do |service|
service.formats :json
service.http_verb :get
service.disable_auth # on by default
# INPUT
service.param.string :name, :default => 'World'
# OUTPUT
service.response do |response|
response.object do |obj|
obj.string :message, :doc => "The greeting message sent back. Defaults to 'World'"
obj.datetime :at, :doc => "The timestamp of when the message was dispatched"
end
end
# DOCUMENTATION
service.documentation do |doc|
doc.overall "This service provides a simple hello world implementation example."
doc.param :name, "The name of the person to greet."
doc.example "<code>curl -I 'http://localhost:9292/hello_world?name=Matt'</code>"
end
# ACTION/IMPLEMENTATION
service.implementation do
{:message => "Hello #{params[:name]}", :at => Time.now}.to_json
end
end
view raw hello_world.rb hosted with ❤ by GitHub

Basis test validating the contract defined in the DSL and the actual output when the service is called:

class HelloWorldTest < MiniTest::Unit::TestCase
def test_response
TestApi.get "/hello_world", :name => 'Matt'
assert_api_response
end
end
view raw gistfile1.rb hosted with ❤ by GitHub

Generated documentation:

If the DSL and its features seem appealing to you and you are interested in digging more into it, the easiest way is to fork this demo repo and start writing your own services.

The DSL has been used in production for more than a year, but there certainly are tweaks and small changes that can make the user experience even better. Feel free to fork the DSL repo and send me Pull Requests.

, ,

25 Comments

Learning from Rails’ failures

Ruby on Rails undisputedly changed the way web frameworks are designed. Rails became a reference when it comes to leveraging conventions, easy baked in feature set and a rich ecosystem. However, I think that Rails did and still does a lot of things pretty poorly.  By writing this post, I’m not trying to denigrate Rails, there are many other people out there already doing that. My hope is that by listing what I think didn’t and still doesn’t go well, we can learn from our mistakes and improve existing solutions or create better new ones.

Migration/upgrades

Migrating a Rails App from a version to the other is very much like playing the lottery, you are almost sure you will lose. To be more correct, you know things will break, you just don’t know what, when and how. The Rails team seems to think that everybody is always running on the cutting edge version and don’t consider people who prefer to stay a few version behind for stability reasons. What’s worse is that plugins/gems might or might not compatible with the version you are updating to, but you will only know that by trying yourself and letting others try and report potential issues.

This is for me, by far, the biggest issue with Rails and something that should have been fixed a long time ago. If you’re using the WordPress blog engine, you know how easy and safe it is to upgrade the engine or the plugins. Granted WordPress isn’t a web dev framework, but it gives you an idea of what kind of experience we should be striving for.

 

Stability vs playground zone

New features are cool and they help make the platform more appealing to new comers. They also help shape the future of a framework. But from my perspective, that shouldn’t come to the cost of stability. Rails 3′s new asset pipeline is a good example of a half-baked solution shoved in a release at the last minute and creating a nightmare for a lot of us trying to upgrade. I know, I know, you can turn off the asset pipeline and it got better since it was first released. But shouldn’t that be the other way around? Shouldn’t fun new ideas risking the stability of an app or making migration harder, be off by default and turned on only by people wanting to experiment? When your framework is young, it’s normal that you move fast and sometimes break, but once it matures, these things shouldn’t happen.

 

Public/private/plugin APIs

This is more of a recommendation than anything else. When you write a framework in a very dynamic language like Ruby, people will “monkey patch” your code to inject features. Sometimes it is due to software design challenges, sometimes it’s because people don’t know better. However,  by not explicitly specifying what APIs are private (they can change at anytime, don’t touch), what APIs are public (stable, will be slowly deprecated when they need to be changed) and which ones are for plugin devs only (APIs meant for instrumentation, extension etc..), you are making migration to newer versions much harder. You see, if you have a small, clean public API, then it’s easy to see what could break, warn developers and avoid migration nightmares. However, you need to start doing that early on in your project, otherwise you will end up like Rails where all code can potentially change anytime.

 

Rails/Merb merge was a mistake

This is my personal opinion and well, feel free to disagree, nobody will ever be able to know to for sure. Without explaining what happened behind closed doors and the various personal motivations, looking at the end result, I agree with the group of people thinking that the merge didn’t turn up to be a good thing. For me, Rails 3 isn’t significantly better than Rails 2 and it took forever to be released. You still can’t really run a mini Rails stack like promised. I did hear that Strobe (company who was hiring Carl Lerche, Yehuda Katz and contracted Jose Valim) used to have an ActionPack based, mini stack but it was never released and apparently only Rails core members really knew what was going on there. Performance in vanilla Rails 3 are only now getting close to what you had with Rails 2 (and therefore far from the perf you were getting with Merb). Thread-safety is still OFF by default meaning that by default your app uses a giant lock only allowing a process to handle 1 request at a time. For me, the flexibility and performance focus of Merb were mainly lost in the merge with Rails. (Granted, some important things such as ActiveModel, cleaner internals and others have made their way into Rails 3)

But what’s worse than everything listed so far is that the lack of competition and the internal rewrites made Rails lose its headstart.  Rails is very much HTML/view focused, its primarily strength is to make server side views trivial and it does an amazing job at that. But let’s be honest, that’s not the future for web dev. The future is more and more logic pushed to run on the client side (in JS) and the server side being used as an API serving data for the view layer. I’m sorry but adding support for CoffeeScript doesn’t really do much to making Rails evolve ahead of what it currently is. Don’t get me wrong, I’m a big fan of CoffeeScript, that said I still find that Rails is far from being optimized to developer web APIs in Rails. You can certainly do it, but you are basically using a tool that wasn’t designed to write APIs and you pay the overhead for that. If there is one thing I wish Rails will get better at is to make writing pure web APIs better (thankfully there is Sinatra). But at the end of the day, I think that two projects with different philosophies and different approaches are really hard to merge, especially in the open source world. I wouldn’t go as far as saying like others that Rails lost its sexiness to node.js because of the wasted time, but I do think that things would have been better for all if that didn’t happen. However, I also have to admit that I’m not sure how much of a big deal that is. I prefer to leave the past behind, learn from my own mistake and move on.

 

Technical debts

Here I’d like to stop to give a huge props to Aaron “@tenderlove” Patterson, the man who’s actively working to reduce the technical debts in the Rails code base. This is a really hard job and definitely not a very glamorous one. He’s been working on various parts of Rails including its router and its ORM (ActiveRecord). Technical debts are unfortunately normal in most project, but sometimes they are overwhelming to the point that nobody dares touching the code base to clean it up. This is a hard problem, especially when projects move fast like Rails did. But looking back, I think that you want to start tackling technical debts on the side as you move on so you avoid getting to the point that you need a hero to come up and clean the piled errors made in the past. But don’t pause your entire project to clean things up otherwise you will lose market, momentum and excitement. I feel that this is also very much true for any legacy project you might pick up as a developer.

 

Keep the cost of entry level low

Getting started with Rails used to be easier. This can obviously argued since it’s very subjective, but from my perspective I think we forgot where we come from and we involuntary expect new comers to come with unrealistic knowledge. Sure, Rails does much more than it used to do, but it’s also much harder to get started. I’m not going to argue how harder  it is now or why we got there. Let’s just keep in mind that it is a critical thing that should always be re-evaluated. Sure, it’s harder when you have an open source project, but it’s also up to the leadership to show that they care and to encourage and mentor volunteers to  focus on this important part of a project.

 

Documentation

Rails documentation isn’t bad, but it’s far from being great. Documentation certainly isn’t one of the Ruby’s community strength, especially compared with the Python community, but what saddens me is to see the state of the official documentation which, should, in theory be the reference. Note that the Rails guides are usually well written and provide value, but they too often seem too light and not useful when you try to do something not totally basic (for instance use an ActiveModel compliant object). That’s probably why most people don’t refer to them or don’t spend too much time there. I’m not trying to blame anyone there. I think that the people who contributed theses guides did an amazing job, but if you want to build a strong and easy to access community, great documentation is key. Look at the Django documentation as a good example. That said, I also need to acknowledge the amazing job done by many community members such as Ryan Bates and Michael Hartl consistently providing high value external documentation via the railscasts and the intro to Rails tutorial available for free.

 

In conclusion, I think that there is a lot to learn from Rails, lots of great things as well as lots of things you would want to avoid. We can certainly argue on Hacker News or via comments about whether or not I’m right about Rails failures, my point will still be that the mentioned issues should be avoided in any projects, Rails here is just an example. Many of these issues are currently being addressed by the Rails team but wouldn’t it be great if new projects learn from older ones and avoid making the same mistakes? So what other mistakes do you think I forgot to mention and that one should be very careful of avoiding?

 

Updates:

  1. Rails 4 had an API centric app generator but it was quickly reverted and will live as gem until it’s mature enough.
  2. Rails 4 improved the ActiveModel API to be simpler to get started with. See this blog post for more info.

, , , , , ,

36 Comments

Quick dive into Ruby ORM object initialization

Yesterday I did some quick digging into how ORM objects are initialized and the performance cost associated to that. In other words, I wanted to see what’s going on when you initialize an ActiveRecord object.

Before I show you the benchmark numbers and you jump to conclusions, it’s important to realize that in the grand scheme of things, the performance cost we are talking is small enough that it is certainly not the main reason why your application is slow. Spoiler alert: ActiveRecord is slow but the cost of initialization isn’t by far the worse part of ActiveRecord. Also, even though this article doesn’t make activeRecord look good, and I’m not trying to diss it. It’s a decent ORM that does a great job in most cases.

Let’s get started by the benchmarks number to give us an idea of the damage (using Ruby 1.9.3 p125):

 

                                                             | Class | Hash  | AR 3.2.1 | AR no protection | Datamapper | Sequel |
--------------------------------------------------------------------------------------------------------------------------------------
.new() x100000                                               | 0.037 | 0.049 | 1.557    | 1.536            | 0.027      | 0.209  |
.new({:id=>1, :title=>"Foo", :text=>"Bar"}) x100000          | 0.327 | 0.038 | 6.784    | 5.972            | 4.226      | 1.986  |

 

You can see that I am comparing the allocation of a Class instance, a Hash and some ORM models. The benchmark suite tests the allocation of an empty object and one with passed attributes. The benchmark in question is available here.

As you can see there seems to be a huge performance difference between allocating a basic class and an ORM class. Instantiating an ActiveRecord class is 20x slower than instantiating a normal class, while ActiveRecord offers some extra features, why is it so much slower, especially at initialization time?

The best way to figure it out is to profile the initialization. For that, I used perftools.rb and I generated a graph of the call stack.

Here is what Ruby does (and spends its time) when you initialize a new Model instance (click to download the PDF version):

 

Profiler diagram of AR model instantiation by Matt Aimonetti

 

This is quite a scary graph but it shows nicely the features you are getting and their cost associated. For instance, the option of having the before and after initialization callback cost you 14% of your CPU time per instantiation, even though you probably almost never use these callbacks. I’m reading that by interpreting the node called ActiveSupport::Callback#run_callbacks, 3rd level from the top. So 14.1% of the CPU time is spent trying to run callbacks. As a quick note, note that 90.1% of the CPU time is spent initializing objects, the rest is spent in the loop and in the garbage collection (because the profiler runs many loops). You can then follow the code and see how the code works, creating a dynamic class callback method on the fly (the one with the long name) and then recreating the name of this callback to call it each time the object is allocated. It sounds like that’s a good place for some micro optimizations which could yield up to 14% performance increase in some cases.

Another major part of the CPU time is spent in ActiveModel’s sanitization. This is the piece of code that allows you to block some model attributes to be mass assigned. This is useful when you don’t want to sanitize your incoming params but want to create or update a model instance by using all the passed user params. To avoid malicious users to modify some specific params that might be in your model but not in your form, you can protect these attributes. A good example would be an admin flag on a User object. That said, if you manually initialize an instance, you don’t need this extra protection, that’s why in the benchmark above, I tested and without the protection. As you can see, it makes quite a big difference. The profiler graph of the same initialization without the mass assignment protection logically ends up looking quite different:

 


Matt Aimonetti shows the stack trace generated by the instantiation of an Active Record model

 

Update: My colleague Glenn Vanderburg pointed out that some people might assuming that the shown code path is called for each record loaded from the database. This isn’t correct, the graph represents instances allocated by calling #new. See the addition at the bottom of the post for more details about what’s going on when you fetch data from the DB.

I then decided to look at the graphs for the two other popular Ruby ORMs:

Datamapper

 

and Sequel

 

 

While I didn’t give you much insight in ORM code, I hope that this post will motivate you to sometimes take a look under the cover and profile your code to see what’s going on and why it might be slow. Never assume, always measure. Tools such as perftools are a great way to get a visual feedback and get a better understanding of how the Ruby interpreter is handling your code.

UPDATE:

I heard you liked graphs so I added some more, here is what’s going on when you do Model.first:

 

Model.all

 

And finally this is the code graph for a call to Model.instantiate which is called after a record was retrieved from the database to convert into an Object. (You can see the #instantiate call referenced in the graph above).

 

, , , ,

9 Comments

First step in scaling a web site: HTTP caching

Today my friend Patrick Crowley and I were talking about scaling his website: http://cinematreasures.org since an article covering his work will soon be published in a very popular newspaper. Patrick’s site is hosted on Heroku which comes by default with Varnish caching enabled.

The challenge is that a lot of people using the Rails framework are used to doing page caching instead of relying on HTTP caching, even though this feature was added a long time ago. The major problem with page caching is that it doesn’t scale that well as soon as you run more than one server. Indeed you would need to store the page content to a shared drive between your servers or use memcached and do some work to avoid hitting your app every single time. On the other hand, HTTP caching is extremely easy to handle at the application level and it will dramatically reduce the amount of requests hitting your app. Let me explain a little more about HTTP caching.

Ryan Tomako wrote an excellent post about the details of caching, I strongly recommend you read it. In a nutshell, the HTTP caching layer (usually) seats before your application layer and allows you, the developer to store some responses that can be send back to the users based on optional conditions. That might still seem vague, let’s take a concrete example. If you look at http://cinematreasures.org‘s home page you can see that it’s an agglomerate of various information:

CinemaTreasures homepage

And the bottom of the page contains even more dynamic data such as the popular movie theater photos, latest movie theater videos and latest tweets. One might look at that and say that this page can’t really be cached and that the caching should be done at the model layer (i.e. cache the data coming from the database). I would certainly agree that caching the data layer is probably a good idea, but you shouldn’t start by that. In fact without caching, this page renders fast enough. The problem is when someone like Roger Ebert tweets about CinemaTreasures the load on the app peaks significantly. At the point, the amount of concurrent connections your app can handle gets put to the challenge. Even though your page load is “fast enough”, requests will queue up and some will eventually time out. That’s actually a perfect case of HTTP caching.

What we want to do in that case is to cache a version of the home page in Varnish for 60 seconds. During that time, all requests coming to the site, will be served by Varnish and will all get the same cached content. That allows our servers to handle the non cached requests and therefore increase our throughput. What’s even better, is that if a user refreshes the home page in his/her browser during the first 60 seconds the requests won’t even make it all the way to our servers. All of that thanks to conditions set on the response. The first user hitting the HTTP cache layer (Varnish in this case) won’t find a fresh cached response, so varnish will forward the request to our application layer which will send back the homepage to varnish and tell Varnish that this content is good for a full minute so please don’t ask for it again until a minute from now. Varnish serves this response to the users’ browser and let the browser know that the server said that the response was good enough for a minute so don’t bother asking for it again. But now, if during these 60 seconds another user comes in, he will hit Varnish and Varnish will have the cached response from the first user and because the cache is still fresh (it’s not been 60 seconds since the first request) and the cache is public, then the same response will be sent to the second user.

As you can see, the real strength of HTTP caching is the fact that it’s a conditional caching. It’s based on the request’s URL and some “flags” set in the request/response headers.

Setting these conditions in your app is actually very simple since you just need to set the response’s headers. If you are using a Ruby framework you will more than likely have access to the request object via the “request” method and you can set the headers directly like that: “response.headers['Cache-Control'] = ‘public, max-age=60′”.
In Rails, you can actually use a helper method instead: expires_in 1.minute, :public => true.

You might have a case where you HAVE TO serve fresh content if available and can’t serve stale cached content even for a few seconds. In this case, you can rely on the Etag header value. The Etag is meant to validate the freshness of a cached response. Think of it as a signature (unique ID) that is set on the response and used by the client (or cache layer) to see if the server response has changed or not. The way it works is that the client keeps track of the Etag received for each request (attached to the cached response) and then sends it with the next requests. The HTTP layer or application sees the Etag in the request and can check if it is still valid and the content didn’t change. If that’s the case, an empty response can be sent with a special HTTP status code (304) to let know the client that the old cached value is still good to be used.  Rails has a helper called “stale?” that helps you do the Etag/last modified check and allows you to not fetch all the objects from the database by doing a cheap check on an attribute (For instance you can check the updated_at value and use that as a condition to pull an object and its relationships).

So I explain HTTP caching, I often hear people telling me: “that’s great Matt, but you know what, that won’t work for us because we have custom content that we display specifically to our users”. So in that case, you can always set the Cache-Control header to private which will only cache the response in the client’s browser and not the cache layer. That’s good to some extent, but it can definitely be improved by rethinking a bit your view layer. In most web apps, the page content is rendered by server side code (Rails, Django, node.js, PHP..) and sent to the user all prepared for him. There are a few challenges with this approach, the biggest one is that the server has to wait until everything is ready (all data fetched, view rendered etc…) before sending back a response and before the client’s browser can start rendering (there are ways to chunk the response but that’s besides the scope of this post). The other is that the same expensive content has to be calculated/rendered for two different users because you might be inserting the username of the current user at the top of the page for instance. A classic way to deal with that is often to use fragment caching, where the expensive rendering is cached and reused by different requests. That’s good but if the only reason to do that is because we are displaying some user specific data, there is a simpler way: async page rendering. The concept is extremely simple: remove all user specific content from the rendered page and then inject the user content in a second step once the page is displayed. The advantage is that now the full page can be cached in Varnish (or Squid or whatever you use for HTTP caching). To inject the user content, the easiest way is to use JavaScript.

Let’s stay on CinemaTreasures, when you’re logged in, the username is shown on the top of each page:

Once logged in, the username is displayed on all pages

The only things that differs from the page rendered when the user is not logged in and when he is, are these 2 links and an avatar. So let’s write some code to inject that after rendering the page.

In Rails, in the sessions controller or whatever code logs you in, you need to create a new cookie containing the username:

cookies[:username] = {
         :value => session[:username],
         :expires => 2.days.from_now,
         :domain => ".cinematreasures.org"
       }

As you can see, we don’t store the data in the session cookie and the data won’t be encrypted. You need to be careful that someone changing his cookie value can’t access data he/should shouldn’t. But that’s a different discussion. Now that the cookie is set, we can read it from JavaScript when the page is loaded.

document.observe("dom:loaded", function() {
  displayLoggedinUserLinks();
});
 
function readCookie(name) {
     var nameEQ = name + "=";
     var ca = document.cookie.split(';');
     for(var i=0;i &lt; ca.length;i++) {
          var c = ca[i];
          while (c.charAt(0)==' ') c = c.substring(1,c.length);
          if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length,c.length);
     }
     return null;
}
 
function displayLoggedinUserLinks() {
  var username            = readCookie('username');
  var loginLink           = $('login');
  var logout              = $('logout');
  if (username == null){
    loginLink.show();
    logout.hide();
  }else{
    // user is logged in and we have his/her username
    loginLink.hide();
    if(userGreetings){ userGreetings.update("<span id='username'>username</span>"); }
    logout.show();
    showAvatar(username);
  };
  return true;
}

The code above doesn’t do much, once the DOM is loaded, the displayLoggedinUserLinks() function gets trigger. This function reads the cookie via the readCookie() function and if a username is found, the login link is hidden, the user name is displayed, as well as the logout link and the avatar. (You can also use a jQuery cookie plugin to handle the cookie, but this is an old example using Prototype, replace the code accordingly)
When the user logs out, we just need to delete the username cookie and the cached page will be rendered properly. In Rails, you would do delete the cookie like that: cookies.delete(‘username’).
Quite often you might even want to make an Ajax call to get some information such as the number of user messages or notifications. Using jQuery or whatever JS framework you fancy you can do that once the page is rendered. Here is an example, on this page, you can see the learderboards for MLB The Show. The leaderboards don’t change that often, especially the overall leaderboards so they can be cached for a little while, however the player’s presence can change anytime. The smart way to deal with that, would be to cache the  leaderboards for a few seconds/minutes and make an ajax call to a presence service passing it a list of user ids collected from the DOM. The service called via Ajax could also be cached  depending on the requirements.

Now there is one more problem that people using might encouter: flash notices. For those of you not familiar with Rails, flash notices are messages set in the controller and passed to the view via the session (at least last time I checked). The problem happens if I’m the home page isn’t cached anymore and I logged in which redirects me to the home page with a flash message like so:

The problem is that the message is part of the rendered page and now for 60 seconds, all people hitting the home page will get the same message. This is why you would want to write a helper that would put this message in a custom cookie that you’d pull JS and then delete once displayed. You could use a helper like that to set the cookie:

def flash_notice_cookie(msg, expiration=nil)
  cookies[:flash_notice] = {
    :value => msg,
    :expires => expiration || 1.minutes.from_now,
    :domain => ".cinematreasures.com"
   }
end

And then add a function called when the DOM is ready which loads the message and injects it in the DOM. Once the cookie read, delete it so the message isn’t displayed again.

 

So there you have it, if you follow these few steps, you should be able to handle easily 10x more traffic without increasing hardware or making any type of crazy code change. Before you start looking into memcached, redis, cdns or whatever, consider HTTP caching and async DOM manipulation. Finally, note that if you can’t use Varnish or Squid, you can very easily setup Rack-Cache locally and share the cache via memcached. It’s also a great way to test locally.


Update: CinemaTreasures was updated to use HTTP caching as described above. The hosting cost is now half of what it used to be and the throughput is actually higher which offers a better protection against peak traffic.


 

External resources:

, , , , ,

6 Comments

Video game web framework design

In this post I will do my best to explain why and how I reinvented the wheel and wrote a custom web framework for some of Sony’s AAA console titles. My goal is to reflect on my work by walking you through the design process and some of the implementation decisions. This is not about being right or being wrong, it’s about designing a technical solution to solve concrete business challenges.

Problem Domain

The video game industry is quite special, to say the least. It shares a lot of similarities with the movie industry. The big difference is that the movie industry hasn’t evolved as quickly as the video game industry has. But the concept is the same, someone comes up with a great idea, finds a team/studio to develop the game and finds a publisher. The development length and budget depends on the type of game, but for a AAA console game, it usually takes a least a few million and a minimum of a year of work once the project has received the green light. The creation of such a game involves various teams, designers, artists, animators, audio teams, developers, producers, QA, marketing, management/overhead etc.. Once the game gets released, players purchase the whole game for a one time fee and the studio moves on to their next game. Of course things are not that simple, with the latest platforms, we now have the option to patch games, add DLC etc.. But historically, a console game is considered done when it ships, exactly like a movie, and very little work is scheduled post release.

Concretely such an approach exposes a few challenges when trying to implement online features for a AAA console title:

  • Communication with the game client network team
  • Scalability, performance
  • Insane deadlines, unstable design (constant change of requirements)
  • Can’t afford to keep on working on the system once released (time delimited projects)

 

Communication

As in most situations, communication is one of the biggest challenges. Communication is even harder in the video game industry since you have so many teams and experts involved. Each team speaks its own jargon, has its own expertise and its own deadlines. But all focus on the same goal: releasing the best game ever. The team I’m part of has implementing online features as its goal. That’s the way we bring business value to our titles. Concretely, that means that we provide the game client developers with a C++ SDK which connects to custom web APIs written in Ruby. The API implementations rely on various data stores (MySQL, Redis, Memcached, memory) to store and retrieve all sorts of game data.

Nobody but our team should care about the implementation details, after all, the whole point of providing an API is to provide a simple interface so others can do their part of the job in the easiest way possible. This is exactly where communication becomes a problem. The design of these APIs should be the result of the work of two teams with two different domains of expertise and different concerns. One team focuses on client performance, memory optimization and making the online resources available to the game engine without affecting the game play. The other, focuses on server performance, latency, scalability, data storage and system contention under load. Both groups have to come together to find a compromise making each other’s job doable. Unfortunately, things are not that simple and game designers (who are usually not technical people) have a hard time not changing their designs and requirements every other week (usually for good reasons) making API design challenging and creating tension between the teams.

From this perspective, the API is the most important deliverable for our team and it should communicate the design goal while being very explicit about how it works, why it works the way it does, and how to implement it client side. This is a very good place where we can improve communication by making sure that we focus on making clear, well designed, well documented, flexible APIs.

 

Scalability, performance

On the server side, the APIs need to perform and scale to handles tends of thousands of concurrent requests. Web developers often rely on aggressive HTTP caching but in our case, the web client (our SDK) has a limited amount of memory available and 90% of the requests are user specific (can’t use full page HTTP cache) and a lot of these are POST/DELETE requests (can’t be cached). That means that, to scale, we have to focus on what most developers don’t often have to worry too much about: all the small details which, put together with a high load, end up drastically affecting your performance.

While Ruby is a great language, a lot of the libraries and frameworks are not optimized for performance, at least not the type of performance needed for our use case. However, the good news is that this is easily fixable and many alternatives exist (lots of async, non-blocking drivers for i.e). When obsessed with performance, you quickly learn to properly load test, profile, and monitor your code to find the bottlenecks and the places where you should focus your attention. The big, unique challenge though, is that a console game will more than likely see its peak traffic in the first few weeks, not really giving the chance to the online team to iteratively handle the prod issues. The only solution is to do everything possible before going live to ensure that the system will perform as expected. Of course if we were to write the same services in a more performant language, we would need to spend less time optimizing. But we are gaining so much flexibility by using a higher level programming language that, in my mind, the trade off is totally worth it (plus you still need to spend a lot of time optimizing your code path, even if your code is written in a very fast language).

 

Deadlines, requirement changes

That’s just part of the way the industry works. Unless you work for Blizzard and you can afford to spend a crazy amount of time and money on the development of a title; you will have to deal with sliding deadlines, requirement changes, scope changes etc… The only way I know how to protect myself from such things is to plan for the worst. Being a non-idealistic (read pessimistic) person helps a lot. When you design your software, make sure your design is sound but flexible enough to handle any major change that you know could happen at any time. Pick your battles and make sure your assumptions are properly thought through, communicated and documented so others understand and accept them. In a nutshell, this is a problem we can’t avoid, so you need to embrace it.

 

Limited reusability

This topic has a lot to do with the previous paragraph. Because scopes can change often and because the deadlines are often crazy, a lot of the time, engineers don’t take the time to think about reusability. They slap some code together, pray to the lords of Kobol and hope that they won’t have to look at their code ever again (I’m guilty of having done that too). The result is a lot of throw away code. This is actually quite frequent and normal in our industry. But it doesn’t mean that it the right thing to do! The assumption/myth is that each game is different and therefore two games can’t be using the same tech solution. My take on that is that it’s partly true. But some components are the same for 80% of the games I work on. So why not design them well and reuse the common parts? (A lot of games share the same engines, such as Unreal for example, and there is no reason why we can’t build a core online engine extended for each title)

 

My approach

When I joined Sony, I had limited experience with the console video game industry and my experience was not even related to online gaming. So even though I had (strong) opinions (and was often quite (perhaps even too) vocal about them), I did my best to improve existing components and work with the existing system. During that time, the team shipped 4 AAA titles on the existing system. As we were going through the game cycles, I did my best to understand the problem domain, the reasons behind some of the design decisions and finally I looked at what could be done differently to improve our business value. After releasing a title with some serious technical difficulties, I spent some time analyzing and listing the problems we had and their root causes. I asked our senior director for a mission statement and we got the team together to define the desiderata/objectives of our base technology. Here is what we came up with:

  1. Stability
  2. Performance / Scalability
  3. Encapsulation / Modularity
  4. Documentation
  5. Conventions
  6. Reusability / Maintainability

These objectives are meant to help us objectively evaluate two options. The legacy solution was based on Rails, or more accurately: Rails was used in the legacy solution. Rails had been hacked in so many different ways that it was really hard to update anything without breaking random parts of the framework. The way to do basic things kept being changed, there was no consistent design, no entry points, no conventions and each new game would duplicate the source code of the previously released game and make the game specific changes. Patches were hard to back port and older titles were often not patched up. The performance was atrocious under load, mainly due to hacked-up Rails not performing well. (Rails was allocating so many objects per request that the GC was taking a huge amount of the request cycles, the default XML builder also created a ton load of objects etc…) This was your typical broken windows scenario. Engineers were getting frustrated, motivation was fainting, bugs were piling up and nobody felt ownership over the tech.

Now, to be fair, it is important to explain that the legacy system was hacked up together due to lack of time, lack of resources and a lot of pressure to release something ASAP. So, while the end result sounds bad, the context is very important to note. This is quite common in software engineering and when you get there, the goal is not to point fingers but to identify the good and the bad parts of the original solution. You then use this info to decide what to do: fix the existing system or rewrite, porting the good parts.

Our report also came up with a plan. A plan to redesign our technology stack to match the desiderata previously mentioned. To put it simply, the plan was to write a new custom web framework focusing on stability, performance, modularity and documentation. Now, there are frameworks out there which already do that or value these principles. But none of them focus on web APIs and none of them are specific to game development. Finally, the other issue was that we had invested a lot of time on game specific code and we couldn’t throw away all that work, so the new framework had to support a good chunk of legacy code but had to make it run much faster.

Design choices

Low conversion cost

Using node.jscoffee script/Scala/whatever new fancy tech was not really an option. We have a bunch of games out there which are running on the old system and some of these games will have a sequel or a game close enough that we could reuse part of the work. We don’t want to have to rewrite the existing code. I therefore made sure that we could reuse 90% of the business logic by adding an abstraction layer doing the heavy lifting at boot time and therefore not affecting the runtime performance. Simple conversion scripts were also written to import the core of the existing code over.

Lessons learned: It is very tempting to just redo everything and start from scratch. However, the business logic implementation wasn’t the main cause of our problems. Even though I wish we could have redesigned that piece of the puzzle, it didn’t make sense from a business perspective. A lot of thought had to be put into how to obtain the expected performance level while keeping the optional model/controller/view combos. By having full control of the “web engine”, we managed to isolate things properly without breaking the old paradigms. We also got rid of a lot of assumptions allowing us to design new titles a bit differently while being backward compatible and have our code run dramatically faster.

Web API centric

This is probably the most important design element. If I had to summarize what our system does in just a few words, I would say: a game web API. Of course, it’s much more than that. We have admin interfaces, producer dashboards, community websites, lobbies, p2p, BI reports, async processing jobs etc… But at the end of the day, the only one piece you can’t remove is the game web API. So I really wanted the design to focus on that aspect. When a developer starts implementing a new online game feature, I want him/her to think about the API. But I also want this API to be extremely well documented so the developer working client-side understands the purpose of the API, how to use it, and what the expected response is right away. I also wanted to be able to automatically test our APIs at a very basic level so we could validate that there are discrepancies between what the client expects and what the server provides. To do that, I created a standalone API DSL with everything needed to describe your API but without any implementation details whatsoever. The API DSL lets the developer define a route (url), the HTTP verb expected, if the request should be authenticated or not, SSL or not, the param rules, default values and finally a response description (which was quite a controversial choice). All of these settings can be documented by the developer. This standalone DSL can then be consumed by different tools. For instance we have a tool extracting all the info into nicely formatted HTML doc for the game client developers. This tool doesn’t need to load the framework to just render the documentation. We also use this description at boot time to compile the validation rules and routes, allowing for a much faster request dispatch. And we also use these API description to generate some low level data for the client. Finally, we used the service description DSL to help create mocked service responses allowing the client team to test service designs without having to wait for the implementation streamlining the process.

Lessons learned: We had a lot of internal discussions about the need to define the response within the service description. Some argued that it’s a duplication since we already had a view and we could parse that to get most of what we needed (which is what the old system was doing). We ended up going with the response description DSL for a few critical reasons: testing and implementation simplicity. Testing: we need to have an API expectation reference and to keep this reference sane so we can see if something is changed. If we were to magically parse the response, we couldn’t test the view part of the code against a frame of reference. Implementation simplicity: magically parsing a view template is more tricky that it sounds, you would need to render the template with the right data to make it work properly. Furthermore, you can’t document a response easily in the view, and if you do, you arguably break the separation of concern between the description and the implementation. Finally, generated documentation isn’t enough and that’s why we decided to write English documentation, some being close to the code and some being just good old documentation explaining things outside of the code context.

Modularity

In order to make our code reusable we had to isolate each component and limit the dependencies. We wrote a very simple extension layer allowing each extension to registers itself once detected. The extension interface exposes the path of the extension, its type, models, services, controllers, migrations, seed data, dependencies etc.. Each extension is contained in a folder. (The extension location doesn’t matter much but as part of the framework boot sequence, we check a few default places.) The second step of the process is to check a manifest/config file that is specific to each title. The manifest file lists the extensions that should be activated for the title. The framework then activates the marked extensions and has access to libs, models, views, migrations, seed data and of course to load services (DSL mentioned earlier) etc…

Even though we designed the core extensions the best we could, there are cases where some titles will need to extend these extensions. To do that, we added a bunch of hooks that could be implemented on the title side if needed (Ruby makes that super easy and clean to do!). A good example of that is the login sequence or the player data.

Lessons learned: The challenge with modularity is to keep things simple and highly performing yet flexible. A key element to manage that is to stay as consistent as possible. Don’t implement hooks three different ways, try to keep method signatures consistent, keep it simple and organized.

 

Conclusion

It’s a bit early to say if this rewrite is a success or not and there are still lots of optimizations and technology improvements we are looking forward to doing. Only time will give us enough retrospect to evaluate our work. But because we defined the business value (mission statement) and the technical objectives, it is safe to say that the new framework meets the expectations quite well. On an early benchmark we noted a 10X speed improvement and that’s before drilling into the performance optimizations such as making all the calls non-blocking, using better connection pools, cache write through layer… However, there is still one thing that we will have to monitor: how much business value will this framework generate. And I guess that’s where we failed to define an agreed upon evaluation grid. I presume that if our developers spend more time designing and implementing APIs and less time debugging that could be considered business value. If we spend less time maintaining or fighting with the game engine, that would also be a win. Finally, if the player experience is improved we will be able to definitely say that we made the right choice.

To conclude, I’d like to highlight my main short coming: I failed to define metrics that would help us evaluate the real business value added to our products. What I consider a technical success might not be a business success. How do you, in your own domain, find ways to define clear and objective metrics?

, ,

No Comments

Ruby concurrency explained

Concurrency is certainly not a new problem but it’s getting more and more attention as machines start having more than 1 core, that web traffic increases drastically and that some new technologies show up saying that they are better because they handle concurrency better.
If that helps, think of concurrency as multitasking. When people say that they want concurrency, they say that they want their code to do multiple different things at the same time. When you are on your computer, you don’t expect to have to choose between browsing the web and listening to some music. You more than likely want to run both concurrently. It’s the same thing with your code, if you are running a webserver, you probably don’t want it to only process one request at a time.
The aim of this article is to explain as simply as possible the concept of concurrency in Ruby, the reason why it’s a complicated topic and finally the different solutions to achieve concurrency.

First off, if you are not really familiar with concurrency, take a minute to read the wikipedia article on the topic which is a great recap on the subject. But now, you should have noticed that my above example was more about parallel programming than concurrency, but we’ll come back to that in a minute.

The real question at the heart of the quest for concurrency is: “how to increase code throughput”.

We want our code to perform better, and we want it to do more in less time. Let’s take two simple and concrete examples to illustrate concurrency. First, let’s pretend you are writing a twitter client, you probably want to let the user scroll his/her tweets while the latest updates are  being fetched. In other words, you don’t want to block the main loop and interrupt the user interaction while your code is waiting for a response from the Twitter API. To do that, a common solution is to use multiple threads. Threads are basically processes that run in the same memory context. We would be using one thread for the main event loop and another thread to process the remote API request. Both threads share the same memory context so once the Twitter API thread is done fetching the data it can update the display. Thankfully, this is usually transparently handled by asynchronous APIs (provided by the OS or the programming language std lib) which avoid blocking the main thread.

The second example is a webserver. Let’s say you want to run a Rails application. Because you are awesome, you expect to see a lot of traffic. Probably more than 1 QPS (query/request per second). You benchmarked your application and you know that the average response time is approximately 100ms. Your Rails app can therefore handle 10QPS using a single process (you can do 10 queries at 100ms in a second).

But what happens if your application gets more than 10 requests per second? Well, it’s simple, the requests will be backed up and will take longer until some start timing out. This is why you want to improve your concurrency. There are different ways to do that, a lot of people feel really strong about these different solutions but they often forget to explain why they dislike one solution or prefer one over the other. You might have heard people conclusions which are often one of these: Rails can’t scale, you only get concurrency with JRuby, threads suck, the only way to concurrency is via threads, we should switch to Erlang/Node.js/Scala, use fibers and you will be fine, add more machines, forking > threading.  Depending on who said what and how often you heard it on twitter, conferences, blog posts, you might start believing what others are saying. But do you really understand why people are saying that and are you sure they are right?

The truth is that this is a complicated matter. The good news is that it’s not THAT complicated!

The thing to keep in mind is that the concurrency models are often defined by the programming language you use. In the case of Java, threading is the usual solution, if you want your Java app to be more concurrent, just run every single request in its own thread and you will be fine (kinda). In PHP, you simply don’t have threads, instead you will start a new process per request. Both have pros and cons, the advantage of the Java threaded approach is that the memory is shared between the threads so you are saving in memory (and startup time), each thread can easily talk to each other via the shared memory. The advantage of PHP is that you don’t have to worry about locks, deadlocks, threadsafe code and all that mess hidden behind threads. Described like that it looks pretty simple, but you might wonder why PHP doesn’t have threads and why Java developers don’t prefer starting multiple processes. The answer is probably related to the language design decisions. PHP is a language designed for the web and for short lived processes. PHP code should be fast to load and not use too much memory. Java code is slower to boot and to warm up, it usually uses quite a lot of memory. Finally, Java is a general purpose programming language not designed primarily for the internet. Others programming languages like Erlang and Scala use a third approach: the actor model. The actor model is somewhat a bit of a mix of both solutions, the difference is that actors are a like threads which don’t share the same memory context. Communication between actors is done via exchanged messages ensuring that each actor handles its own state and therefore avoiding corrupt data (two threads can modify the same data at the same time, but an actor can’t receive two messages at the exact same time). We’ll talk about that design pattern later on, so don’t worry if you are confused.

What about Ruby? Should Ruby developers use threads, multiple processes, actors, something else? The answer is: yes!

Threads

Since version 1.9, Ruby has native threads (before that green threads were used). So in theory, if we would like to, we should be able to use threads everywhere like most Java developers do. Well, that’s almost true, the problem is that Ruby, like Python uses a Global Interpreter Lock (aka GIL). This GIL is a locking mechanism that is meant to protect your data integrity. The GIL only allows data to be modified by one thread at time and therefore doesn’t let threads corrupt data but also it doesn’t allow them to truly run concurrently. That is why some people say that Ruby and Python are not capable of (true) concurrency.

Global Interpreter Lock by Matt Aimonetti

However these people often don’t mention that the GIL makes single threaded programs faster, that multi-threaded programs are much easier to develop since the data structures are safe and finally that a lot of C extensions are not thread safe and without the GIL, these C extensions don’t behave properly. These arguments don’t convince everyone and that’s why you will hear some people say you should look at another Ruby implementation without a GIL, such as JRuby, Rubinius (hydra branch) or MacRuby (Rubinius & MacRuby also offer other concurrency approaches). If you are using an implementation without a GIL, then using threads in Ruby has exactly the same pros/cons than doing so in Java. However, it means that now you have to deal with the nightmare of threads: making sure your data is safe, doesn’t deadlock, check that your code, your libs, plugins and gems are thread safe. Also, running too many threads might affect the performance because your OS doesn’t have enough resources to allocate and it ends up spending its time context switching. It’s up to you to see if it’s worth it for your project.

Multiple processes & forking

That’s the most commonly used solution to gain concurrency when using Ruby and Python. Because the default language implementation isn’t capable of true concurrency or because you want to avoid the challenges of thread programming, you might want to just start more processes. That’s really easy as long as you don’t want to share states between running processes. If you wanted to do so, you would need to use DRb, a message bus like RabbitMQ, or a shared data store like memcached or a DB. The caveat is that you now need to use a LOT more memory. If want to run 5 Rails processes and your app uses 100Mb you will now need 500Mb, ouch that’s a lot of memory! That is exactly what happens when you use a Rails webserver like Mongrel. Now some other servers like Passenger and Unicorn found a workaround, they rely on unix forking. The advantage of forking in an unix environment implementing the copy-on-write semantics is that we create a new copy of the main process but they both “share” the same physical memory. However, each process can modify its own memory without affecting the other processes. So now, Passenger can load your 100Mb Rails app in a process, then fork this process 5 times and the total footprint will be just a bit more than 100Mb and you can now handle 5X more concurrent requests. Note that if you are allocating memory in your request processing code (read controller/view) your overall memory will grow but you can still run many more processes before running out of memory. This approach is appealing because really easy and pretty safe. If a forked process acts up or leaks memory, just destroy it and create a new fork from the master process. Note that this approach is also used in Resque, the async job processing solution by GitHub.

This solution works well if you want to duplicate a full process like a webserver, however it gets less interesting when you just want to execute some code “in the background”. Resque took this approach because by nature async jobs can yield weird results, leak memory or hang. Dealing with forks allows for an external control of the processes and the cost of the fork isn’t a big deal since we are already in an async processing approach.

Screenshot of GitHub's repository forking

Actors/Fibers

Earlier we talked a bit about the actor model. Since Ruby 1.9, developers now have access to a new type of “lightweight” threads called Fibers. Fibers are not actors and Ruby doesn’t have a native Actor model implementation but some people wrote some actor libs on top of fibers. A fiber is like a simplified thread which isn’t scheduled by the VM but by the programmer. Fibers are like blocks which can be paused and resumed from the outside of from within themselves. Fibers are faster and use less memory than threads as demonstrated in this blog post. However, because of the GIL, you still cannot truly run more than one concurrent fiber by thread and if you want to use multiple CPU cores, you will need to run fibers within more than one thread. So how do fibers help with concurrency? The answer is that they are part of a bigger solution. Fiber allow developers to manually control the scheduling of “concurrent” code but also to have the code within the fiber to auto schedule itself. That’s pretty big because now you can wrap an incoming web request in its own fiber and tell it to send a response back when it’s done doing its things. In the meantime, you can move on the to next incoming request. Whenever a request within a fiber is done, it will automatically resume itself and be returned. Sounds great right? Well, the only problem is that if you are doing any type of blocking IO in a fiber, the entire thread is blocked and the other fibers aren’t running. Blocking operations are operations like database/memcached queries, http requests… basically things you are probably triggering from your controllers. The good news is that the “only” problem to fix now is to avoid blocking IOs. Let’s see how to do that.

fiber

Non blocking IOs/Reactor pattern.

The reactor pattern is quite simple to understand really. The heavy work of making blocking IO calls is delegated to an external service (reactor) which can receive concurrent requests. The service handler (reactor) is given callback methods to trigger asynchronously based on the type of response received. Let me take a limited analogy to hopefully explain the design better. It’s a bit like if you were asking someone a hard question, the person will take a while to reply but his/her reply will make you decide if you raise a flag or not. You have two options, or you choose to wait for the response and decide to raise the flag based on the response, or your flag logic is already defined and you tell the person what to do based on their answer and move on without having to worry about waiting for the answer. The second approach is exactly what the reactor pattern is. It’s obviously slightly more complicated but the key concept is that it allows your code to define methods/blocks to be called based on the response which will come later on.

Reactor Pattern illustrated in Matt Aimonetti's blog

In the case of a single threaded webserver that’s quite important. When a request comes in and your code makes a DB query, you are blocking any other requests from being processed. To avoid that, we could wrap our request in a fiber, trigger an async DB call and pause the fiber so another request can get processed as we are waiting for the DB. Once the DB query comes back, it wakes up the fiber it was trigger from, which then sends the response back to the client. Technically, the server can still only send one response at a time, but now fibers can run in parallel and don’t block the main tread by doing blocking IOs (since it’s done by the reactor).

This is the approach used by Twisted, EventMachine and Node.js. Ruby developers can use EventMachine or an EventMachine based webserver like Thin as well as EM clients/drivers to make non blocking async calls. Mix that with some Fiber love and you get Ruby concurrency. Be careful though, using Thin, non blocking drivers and Rails in threadsafe mode doesn’t mean you are doing concurrent requests. Thin/EM only use one thread and you need to let it know that it’s ok to handle the next request as we are waiting. This is done by deferring the response and let the reactor know about it.

The obvious problem with this approach is that it forces you to change the way you write code. You now need to set a bunch of callbacks, understand the Fiber syntax, and use deferrable responses, I have to admit that this is kind of a pain. If you look at some Node.js code, you will see that it’s not always an elegant approach. The good news tho, is that this process can be wrapped and your code can be written as it if was processed synchronously while being handled asynchronously under the covers. This is a bit more complex to explain without showing code, so this will be the topic of a future post. But I do believe that things will get much easier soon enough.

Conclusion

High concurrency with Ruby is doable and done by many. However, it could made easier. Ruby 1.9 gave us fibers which allow for a more granular control over the concurrency scheduling, combined with non-blocking IO, high concurrency can be achieved. There is also the easy solution of forking a running process to multiply the processing power. However the real question behind this heated debate is what is the future of the Global Interpreter Lock in Ruby, should we remove it to improve concurrency at the cost of dealing with some new major threading issues, unsafe C extensions, etc..? Alternative Ruby implementers seem to believe so, but at the same time Rails still ships with a default mutex lock only allowing requests to be processed one at a time, the reason given being that a lot of people using Rails don’t write thread safe code and a lot of plugins are not threadsafe. Is the future of concurrency something more like libdispatch/GCD where the threads are handled by the kernel and the developer only deals with a simpler/safer API?

Further reading:

, , ,

33 Comments