Posts Tagged scalability

Ruby optimization example and explanation

Recently I wrote a small DSL that allows the user to define some code that then gets executed later on and in different contexts. Imagine something like Sinatra where each route action is defined in a block and then executed in context of an incoming request.

The challenge is that blocks come with their context and you can’t execute a block in the context of another one.

Here is a reduction of the challenge I was trying to solve:

class SolutionZero
  def initialize(origin, &block)
    @origin = origin
    @block = block
  def dispatch
end{ @origin + 1 }.dispatch
# undefined method `+' for nil:NilClass (NoMethodError)

The problem is that the block refers to the @origin instance variable which is not available in its context.
My first workaround was to use instance_eval:

class SolutionOne
  def initialize(origin, &block)
    @origin = origin
    @block = block
  def dispatch
    self.instance_eval &@block
end{ @origin + 2}.dispatch
# 42

My workaround worked fine, since the block was evaluated in the context of the instance and therefore the @origin ivar is made available to block context. Technically, I was good to go, but I wasn’t really pleased with this solution. First using instance_eval often an indication that you are trying to take a shortcut. Then having to convert my block stored as a block back into a proc every single dispatch makes me sad. Finally, I think that this code is probably not performing as well as it could, mainly due to unnecessary object allocations and code evaluation.
I did some benchmarks replacing instance_eval by instance_exec since looking at the C code, instance_exec should be slightly faster. Turns out, it is not so I probably missed something when reading the implementation code.

I wrote some more benchmarks and profiled a loop of 2 million dispatches (only the #disptach method call on the same object). The GC profiler report showed that the GC was invoked 287 times and each invocation was blocking the execution for about 0.15ms.
Using Ruby’s ObjectSpace and disabling the GC during the benchmark, I could see that each loop allocates an object of type T_NODE which is more than likely our @block ivar converted back into a block. This is quite a waste. Furthermore, having to evaluate our block in a different context every single call surely isn’t good for performance.

So instead of doing the work at run time, why not doing it at load time? By that I mean that we can optimize the #dispatch method if we could “precompile” the method body instead of “proxying” the dispatch to an instance_eval call. Here is the code:

class SolutionTwo
  def initialize(origin, &block)
    @origin = origin
  def implementation(block)
    mod =
    mod.send(:define_method, :dispatch, block)
    self.extend mod
end{ @origin + 2}.dispatch
# 42

This optimization is based on the fact that the benchmark (and the real life usage) creates the instance once and then calls #dispatch many times. So by making the initialization of our instance a bit slower, we can drastically improve the performance of the method call. We also still need to execute our block in the right context. And finally, each instance might have a different way to dispatch since it is defined dynamically at initialization. To work around all these issues, we create a new module on which we define a new method called dispatch and the body of this method is the passed block. Then we simply our instance using our new module.

Now every time we call #dispatch, a real method is dispatched which is much faster than doing an eval and no objects are allocated. Running the profiler and the benchmarks script used earlier, we can confirm that the GC doesn’t run a single time and that the optimized code runs 2X faster!


Once again, it’s yet another example showing that you should care about object allocation when dealing with code in the critical path. It also shows how to work around the block bindings. Now, it doesn’t mean that you have to obsess about object allocation and performance, even if my last implementation is 2X faster than the previous, we are only talking about a few microseconds per dispatch. That said microseconds do add up and creating too many objects will slow down even your faster code since the GC will stop-the-world as its cleaning up your memory. In real life, you probably don’t have to worry too much about low level details like that, unless you are working on a framework or sharing your code with others. But at least you can learn and understand why one approach is faster than the other, it might not be useful to you right away, but if you take programming as a craft, it’s good to understand how things work under the hood so you can make educated decisions.


@apeiros in the comments suggested a solution that works & performs the same as my solution, but is much cleaner:

class SolutionTwo
  def initialize(origin, &block)
    @origin = origin
    define_singleton_method(:dispatch, block) if block_given?

, ,


First step in scaling a web site: HTTP caching

Today my friend Patrick Crowley and I were talking about scaling his website: since an article covering his work will soon be published in a very popular newspaper. Patrick’s site is hosted on Heroku which comes by default with Varnish caching enabled.

The challenge is that a lot of people using the Rails framework are used to doing page caching instead of relying on HTTP caching, even though this feature was added a long time ago. The major problem with page caching is that it doesn’t scale that well as soon as you run more than one server. Indeed you would need to store the page content to a shared drive between your servers or use memcached and do some work to avoid hitting your app every single time. On the other hand, HTTP caching is extremely easy to handle at the application level and it will dramatically reduce the amount of requests hitting your app. Let me explain a little more about HTTP caching.

Ryan Tomako wrote an excellent post about the details of caching, I strongly recommend you read it. In a nutshell, the HTTP caching layer (usually) seats before your application layer and allows you, the developer to store some responses that can be send back to the users based on optional conditions. That might still seem vague, let’s take a concrete example. If you look at‘s home page you can see that it’s an agglomerate of various information:

CinemaTreasures homepage

And the bottom of the page contains even more dynamic data such as the popular movie theater photos, latest movie theater videos and latest tweets. One might look at that and say that this page can’t really be cached and that the caching should be done at the model layer (i.e. cache the data coming from the database). I would certainly agree that caching the data layer is probably a good idea, but you shouldn’t start by that. In fact without caching, this page renders fast enough. The problem is when someone like Roger Ebert tweets about CinemaTreasures the load on the app peaks significantly. At the point, the amount of concurrent connections your app can handle gets put to the challenge. Even though your page load is “fast enough”, requests will queue up and some will eventually time out. That’s actually a perfect case of HTTP caching.

What we want to do in that case is to cache a version of the home page in Varnish for 60 seconds. During that time, all requests coming to the site, will be served by Varnish and will all get the same cached content. That allows our servers to handle the non cached requests and therefore increase our throughput. What’s even better, is that if a user refreshes the home page in his/her browser during the first 60 seconds the requests won’t even make it all the way to our servers. All of that thanks to conditions set on the response. The first user hitting the HTTP cache layer (Varnish in this case) won’t find a fresh cached response, so varnish will forward the request to our application layer which will send back the homepage to varnish and tell Varnish that this content is good for a full minute so please don’t ask for it again until a minute from now. Varnish serves this response to the users’ browser and let the browser know that the server said that the response was good enough for a minute so don’t bother asking for it again. But now, if during these 60 seconds another user comes in, he will hit Varnish and Varnish will have the cached response from the first user and because the cache is still fresh (it’s not been 60 seconds since the first request) and the cache is public, then the same response will be sent to the second user.

As you can see, the real strength of HTTP caching is the fact that it’s a conditional caching. It’s based on the request’s URL and some “flags” set in the request/response headers.

Setting these conditions in your app is actually very simple since you just need to set the response’s headers. If you are using a Ruby framework you will more than likely have access to the request object via the “request” method and you can set the headers directly like that: “response.headers['Cache-Control'] = ‘public, max-age=60′”.
In Rails, you can actually use a helper method instead: expires_in 1.minute, :public => true.

You might have a case where you HAVE TO serve fresh content if available and can’t serve stale cached content even for a few seconds. In this case, you can rely on the Etag header value. The Etag is meant to validate the freshness of a cached response. Think of it as a signature (unique ID) that is set on the response and used by the client (or cache layer) to see if the server response has changed or not. The way it works is that the client keeps track of the Etag received for each request (attached to the cached response) and then sends it with the next requests. The HTTP layer or application sees the Etag in the request and can check if it is still valid and the content didn’t change. If that’s the case, an empty response can be sent with a special HTTP status code (304) to let know the client that the old cached value is still good to be used.  Rails has a helper called “stale?” that helps you do the Etag/last modified check and allows you to not fetch all the objects from the database by doing a cheap check on an attribute (For instance you can check the updated_at value and use that as a condition to pull an object and its relationships).

So I explain HTTP caching, I often hear people telling me: “that’s great Matt, but you know what, that won’t work for us because we have custom content that we display specifically to our users”. So in that case, you can always set the Cache-Control header to private which will only cache the response in the client’s browser and not the cache layer. That’s good to some extent, but it can definitely be improved by rethinking a bit your view layer. In most web apps, the page content is rendered by server side code (Rails, Django, node.js, PHP..) and sent to the user all prepared for him. There are a few challenges with this approach, the biggest one is that the server has to wait until everything is ready (all data fetched, view rendered etc…) before sending back a response and before the client’s browser can start rendering (there are ways to chunk the response but that’s besides the scope of this post). The other is that the same expensive content has to be calculated/rendered for two different users because you might be inserting the username of the current user at the top of the page for instance. A classic way to deal with that is often to use fragment caching, where the expensive rendering is cached and reused by different requests. That’s good but if the only reason to do that is because we are displaying some user specific data, there is a simpler way: async page rendering. The concept is extremely simple: remove all user specific content from the rendered page and then inject the user content in a second step once the page is displayed. The advantage is that now the full page can be cached in Varnish (or Squid or whatever you use for HTTP caching). To inject the user content, the easiest way is to use JavaScript.

Let’s stay on CinemaTreasures, when you’re logged in, the username is shown on the top of each page:

Once logged in, the username is displayed on all pages

The only things that differs from the page rendered when the user is not logged in and when he is, are these 2 links and an avatar. So let’s write some code to inject that after rendering the page.

In Rails, in the sessions controller or whatever code logs you in, you need to create a new cookie containing the username:

cookies[:username] = {
         :value => session[:username],
         :expires => 2.days.from_now,
         :domain => ""

As you can see, we don’t store the data in the session cookie and the data won’t be encrypted. You need to be careful that someone changing his cookie value can’t access data he/should shouldn’t. But that’s a different discussion. Now that the cookie is set, we can read it from JavaScript when the page is loaded.

document.observe("dom:loaded", function() {
function readCookie(name) {
     var nameEQ = name + "=";
     var ca = document.cookie.split(';');
     for(var i=0;i < ca.length;i++) {
          var c = ca[i];
          while (c.charAt(0)==' ') c = c.substring(1,c.length);
          if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length,c.length);
     return null;
function displayLoggedinUserLinks() {
  var username            = readCookie('username');
  var loginLink           = $('login');
  var logout              = $('logout');
  if (username == null){;
    // user is logged in and we have his/her username
    if(userGreetings){ userGreetings.update("<span id='username'>username</span>"); };
  return true;

The code above doesn’t do much, once the DOM is loaded, the displayLoggedinUserLinks() function gets trigger. This function reads the cookie via the readCookie() function and if a username is found, the login link is hidden, the user name is displayed, as well as the logout link and the avatar. (You can also use a jQuery cookie plugin to handle the cookie, but this is an old example using Prototype, replace the code accordingly)
When the user logs out, we just need to delete the username cookie and the cached page will be rendered properly. In Rails, you would do delete the cookie like that: cookies.delete(‘username’).
Quite often you might even want to make an Ajax call to get some information such as the number of user messages or notifications. Using jQuery or whatever JS framework you fancy you can do that once the page is rendered. Here is an example, on this page, you can see the learderboards for MLB The Show. The leaderboards don’t change that often, especially the overall leaderboards so they can be cached for a little while, however the player’s presence can change anytime. The smart way to deal with that, would be to cache the  leaderboards for a few seconds/minutes and make an ajax call to a presence service passing it a list of user ids collected from the DOM. The service called via Ajax could also be cached  depending on the requirements.

Now there is one more problem that people using might encouter: flash notices. For those of you not familiar with Rails, flash notices are messages set in the controller and passed to the view via the session (at least last time I checked). The problem happens if I’m the home page isn’t cached anymore and I logged in which redirects me to the home page with a flash message like so:

The problem is that the message is part of the rendered page and now for 60 seconds, all people hitting the home page will get the same message. This is why you would want to write a helper that would put this message in a custom cookie that you’d pull JS and then delete once displayed. You could use a helper like that to set the cookie:

def flash_notice_cookie(msg, expiration=nil)
  cookies[:flash_notice] = {
    :value => msg,
    :expires => expiration || 1.minutes.from_now,
    :domain => ""

And then add a function called when the DOM is ready which loads the message and injects it in the DOM. Once the cookie read, delete it so the message isn’t displayed again.


So there you have it, if you follow these few steps, you should be able to handle easily 10x more traffic without increasing hardware or making any type of crazy code change. Before you start looking into memcached, redis, cdns or whatever, consider HTTP caching and async DOM manipulation. Finally, note that if you can’t use Varnish or Squid, you can very easily setup Rack-Cache locally and share the cache via memcached. It’s also a great way to test locally.

Update: CinemaTreasures was updated to use HTTP caching as described above. The hosting cost is now half of what it used to be and the throughput is actually higher which offers a better protection against peak traffic.


External resources:

, , , , ,


Designing for scalability

Designing beautiful and scalable software is hard. Really hard.

It’s hard for many reasons. But what makes it even harder is that software scalability is a relatively new challenge, something only really done in big companies, companies that are not really keen on sharing their knowledge. The amount of academic work done on software design is quite limited compared to other types of design, but shared knowledge about scalable design is almost nonexistent (Don’t expect to find detailed information about scaling online video games either, the industry is super secretive. And even if this is a niche market where finding skilled/experienced developers is really challenging, information is not shared outside a game project).

I don’t pretend to have the required knowledge to cover this topic at length. However, I do have some exposure and figured I should share what I learned so others can benefit from my experience and push the discussion further.

Designing scalable software is just like any other type of software design, with a few unique constraints. If I had to define the key requirements of a great design I would have to quote Frederick P. Brooks:

“Great designs have conceptual integrity – unity, economy, clarity”

This is true for any type of design and one should always start by that.
Don’t just jump on your keyboard and start writing tests/code right away. Take a minute to think about your design.
That will save you hours of refactoring and headaches.

You’re a designer and might not even know it

You might not be designing the next NASA engine but you are more than likely designing an API that you and others will use. As a matter of fact, unless you write code that will never be seen again, you are writing an Application Programming Interface (API). Every single class, method, function you write is an API that you and others will use. Remember that every time you write code, you are the implementer of a design, and therefore you are a designer.

Giana and Matt Aimonetti

Giana and I, discussing design patterns

When thinking about your design, focus on design concepts instead of implementation details. A design concept must be clear, simple to both explain with words and draw on a whiteboard. If you can’t draw and explain your design on a whiteboard, you have failed one of the great design requirement: clarity. If you work alone, or your coworkers are tired of hearing you, try rubber ducking your design ideas. It’s the same concept as rubber ducking debugging, where a programmer would force himself to explain his code, line-by-line, to a rubber duck on his desk but instead of talking about the code, explain your design and why it’s awesome (I’ve recently done this with my baby girl and it’s been really helpful).

Keeping the design integrity

One of the challenges of designing scalable software is that your constraints are often very unique to your product. Off the shelf solutions don’t work for you, and the specific solution used by another project can’t be transposed to your project because the cause and the effect of what you need to scale are different. The problem is that you really quickly lose design integrity.

Let’s take a look at a concrete example to see how the design integrity can be lost or even not defined at all.
Let’s pretend we want to write a suite of web APIs for video games.

We can look at this task from different perspectives:the shout posted by Matt Aimonetti

  • Video game deadlines are crazy, let’s find a way to release as many APIs ASAP.
  • We’re going to get a huge amount of traffic, let’s make sure we don’t crash and burn.
  • We need to make sure our APIs are simple to use for the dev teams integrating them.

Each of these perspectives reflects a facet of the challenge. Other facets exist that I didn’t mention but that a business person might have listed right away, one of which being: How can we do that for the least amount of money?

To design our API suite, we first need to understand the different perspectives. Gaining this understanding will help us design something better but it will also help us communicate better with the different stakeholders. Once we have a decent understanding of the constraints and expectations, someone needs to explicitly define the design values and their priorities. This is a crucial step in the design process. Systems nowadays are too complicated to be handled by only one person and keeping design integrity requires clear communication.

Design goal and values

The best way to communicate the design is to write a simple sentence defining the primary goal:
“Build a robust, efficient and flexible middleware solution leveraged by external teams to develop online video game features.”

This is a bit like the mission statement of your project, or the elevator pitch you give someone that asks you what you are working on.

Associated with the primary goal are a host of desiderata, or secondary objectives. These are the key objectives used to weigh technical decisions. It’s important for the design to highlight a scale of values so one can refer to them to decide if his/her idea fits the design or not. Here is an example:

  1. Stability
  2. Performance / Scalability
  3. Encapsulation / Modularity
  4. Conventions
  5. Documentation
  6. Reusability / Maintainability

Often these desiderata are applied to most of your projects and reflect your team/company’s technical values. The list might seem simple and unnecessary but, believe me, it will reduce the arguments where John tells Jane that her idea sucks but his is better because he “knows better”. Having an objective reference to refer to when trying to decide which is the best way to go is greatly valuable and will reduce the amount of office drama.


Finally, make sure to explicitly define all the major constraints and to acknowledge the team’s concerns. Here is a small example of what could be listed (which also reflect the previously mentioned perspectives):

  • hard deadlines
  • external teams involved
  • huge load expected
  • limited support available
  • requirements changing quickly
  • limited budget
  • unknown hosting architecture/constraints

Remember that design is always iterative because the constraints keep changing. That’s just the way it is and a lot of technical constraints only appear as you implement or test your design. That’s also why the design needs to be clear but the implementation needs to be flexible.

Reads vs writes

Most of the web apps out there are read heavy, meaning that the stored data gets more accessed than modified. Scaling these type of systems is easier as one can introduce a cache layer, an intermediary storage, which acts as a fast buffer that avoids putting load on the backends. The cost reduction is huge because if you architected your app properly, the data is read from the data store only once (or once every X minutes) after being created/modified.

Caching is so important that it’s even built into the HTTP protocol, making caching trivial.
Speaking of HTTP, a common problem I often see when serving http content to a browser is that even though the backend calls are the same, some information needs to be customized for the current visitor. This prevents it from caching the entire page. An easy solution in this case is to still cache the entire page but to use javascript to fetch the custom data from the backend and to modify the cached http at the client’s browser level directly. As part of your design, you will more than likely need to implement multiple layers of caching and use technologies such as query caching, Varnish, Squid, Memcached, memoization, etc…

The problem is that, as your system gets more traffic, you will notice that the volume of DB/network writes becomes your bottleneck. You will also notice a reduction of your cache/hit ratio because only a small part of your cached data is often retrieved by many clients. At this point, you will need to denormalize to avoid contention, shard your data in silos, or write to cache and flush from cache when the data store is available and not overwhelmed.

Asynchronous processing

One way to avoid write contention is to use async processing. The concept is simple. Instead of directly writing to your datastore after your backend receives a request, you put a message in a queue with all the information needed to run the operation later. On the other side, you have a set number of workers receiving messages and operating on them one after the other.

The advantage of such an approach is that you control the amount of workers and therefore the amount of maximum concurrent writes to your datastore. You can also process the queue before it gets worked and and maybe coalesce some messages or remove outdated/duplicated message. Finally, you can assign more workers to some message types, making sure the important messages get processed first.

Another advantage of this design includes not letting the client hang while you’re processing the data and potentially timeout. You can also process a long queue faster by starting more workers to catch up and retire them later.
You app is more resilient to errors and failed async jobs can be restarted.

Load test, monitor and be proactive

Even the best designs have weak spots and will have to be improved once they are released. Don’t wait for your system to fall apart before looking for solutions. Monitor your app. Every single part of your app. Look for patterns showing signs of potential problems and imagine what you could do to resolve them if they would start manifesting.

Of course before getting there, you will need to understand each part of your system and benchmark/load test/profile your app so you can be ready to face the storm.

Benchmarks and load tests are both super important and, too often, not reflective of what you will really face later on. They are usually great at identifying major problems that should be resolved right away, but fail to show the one big problem you will see on day one when you have to deal with 20k concurrent requests. Use them as indicators, rely on your experience and learn about problems other have faced. This will help you build a knowledge of scalability challenges, their root causes, and their potential solutions.

For benchmarking Ruby code, I use the built-in benchmark tool available in the standard lib.
For simple load testing, I use httperf/autobench and siege.
For anything more complicated, I use JMeter.
In the video game industry, we also often use sims using the client’s code to create load.

Benchmarking without profiling is often useless. Unlike other programming languages, Ruby doesn’t yet have awesome profiling tools easy to use, but things are evolving quickly. Here are some tools I use regularly.

The Ruby wrapper around google perftools is really good.
Before using perftools as often as I do now, I frequently used ruby-prof with kcachegrind.
Ruby 1.9 lets you inspect its garbage collector as explained in a previous post.
And when using MacRuby, I often use DTrace.

Other misc. things I learned


Documentation is critical. It doesn’t matter how you do it but you need to make sure you document what you want to build, how you build it, and why you build it. Documenting will help you and the others working on the project, and will keep you in check. I have started documenting an API and then realized that the design was flawed. Maybe it’s just the way you name a method, or a class, or it can be a weird method signature or even the entire workflow being wrong, but when you document things, design errors appear more obviously.

To document Ruby code, I use yard which is quite similar to javadoc. Code documentation, when writing duck typed language, is, for me, very important since it makes the API designer’s expectations much clearer. I also often add English documentation, written in markdown files and compiled by yard. If you say that your code is simple and that it doesn’t require documentation because anyone can just read it and understand … then you have totally miss the point. Yes, it’s more work to keep documentation and code in sync. But people using web APIs don’t have access to the implementation details. The people distributing compiled APIs don’t give access to their implementation. And honestly, the API should be decoupled from the implementation. I shouldn’t have to guess how to use your API based on how you implemented the code underneath, otherwise my assumptions might be totally wrong.


With great power comes great responsibility. The law of system entropy says that systems become more disorganized over time, so don’t start with complicated code if you can avoid it! It’s not because your programming language lets you do crazy stuff that you have to use it. In 90+% of the time, your code can be written without voodoo and be easier to read, easier to understand, easier to maintain and faster to execute.

If you can’t figure out how to *not* use metaprogramming or weird patterns, take a step back and look at your design, did you miss something?
Also, don’t reinvent the wheel. Use the language the way it was designed to be used. Keep your APIs as small as possible, don’t expose too much as it will be virtually impossible to remove it later on.

As an example, look to what extent Rails modified the Ruby language:

In Rails’ console (Rails 2, Ruby 1.8.7)

>> Array.ancestors
=> [Array, ActiveSupport::CoreExtensions::Array::RandomAccess,
 ActiveSupport::CoreExtensions::Array::Grouping, ActiveSupport::CoreExtensions::Array::ExtractOptions,
 ActiveSupport::CoreExtensions::Array::Conversions, ActiveSupport::CoreExtensions::Array::Access,
 Enumerable, Object, ERB::Util, ActiveSupport::Dependencies::Loadable, Base64::Deprecated, Base64,
>> [].methods.size
=> 233

In irb:

>> Array.ancestors
=> [Array, Enumerable, Object, Kernel]
>> [].methods.size
=> 149

Removing any of these added methods is virtually impossible since some piece of code somewhere might rely on it.

Abstraction & its dangers

Often when designing an API, it’s preferable to offer a well defined public API which will delegate the work to a private implementation shared between multiple public APIs. This approach avoids duplication, makes maintenance easy, and allows for more flexibility. As an example, we can have a public matchmaking API which will delegate most of the work to a private matchmaking interface. If required, swapping the private interface would be totally transparent to the public API. This approach has a downside, however. Having a shared private implementation does create a duplication of APIs. It leaves us with both a public and a private API because we need an API for public access and a private API for the public API to connect to. But when we weigh the benefits and look at what is duplicated, we realize that this trade off is worth it.

Keeping a certain level of abstraction is important to maintaining the separation of concerns as clear as possible. You want to layer your design so that each layer is responsible for itself, only knows about itself, and has limited interactions with other layers. By factoring/isolating the different modules, you can keep a simple, elegant, easy to maintain system. This is a key element of design but one needs to be careful not to obfuscate the design by over abstracting his/her code. This is particularly important when designing a scalable app because you will often need to be able to easily swap parts to optimize each part of your system.

That said, a lot of code out there is unnecessarily complicated. I sometime wonder if the authors of such code try to show that they know some cool language tricks. Or maybe this is due to the fact that, too often, people are impressed by code they don’t understand. The problem with overly complicated or magical code is that it creates yet another abstraction layer between the end user and API. It makes the API more opaque, and that’s a cost you have to take into consideration. Every time you abstract something you have a cost associated with the abstraction. This cost can be calculated in terms of performance loss, clarity loss and maintainability cost.

This is exactly the same problem encountered when trying to normalize data in a database.
Normalizing is a great concept which makes a lot of sense … until you realize that the cost of keeping your data normalized is too great and it becomes a major bottleneck, not letting you scale your application.
At this moment (and probably only then) that you need to denormalize your data.

It’s the same thing with code abstraction. It’s fine to abstract, unless the abstraction is such that it requires too much work to understand what is going on. A bit of duplication is often worth it, but be careful to not abuse it.


Ruby has a decent debugger called ruby-debug and I’m amazed by the amount of people who haven’t heard about it.
I don’t know what I would do if I couldn’t use breakpoints and get an interactive shell to debug Ruby code.
Please people! This is 2011, stop using print statement as a means of debugging!


That’s is for this post. It was longer than expected and I feel I didn’t really cover anything in depth, but hopefully you learned something new or at least read something that piqued your interest. I look forward to reading your comments and, hopefully, your blog posts sharing your experience in designing scalable software.

, ,


Causality of scalability

Part of my job at Sony PlayStation is to architect scalable systems which can handle a horde of excited players eager to be the first to play the latest awesome game and who would play for 14-24 hours straight. In other words, I need to make sure a system can “scale”. In my case, a scalable system is a system that can go from a few hundred concurrent users/players to hundreds of thousands of concurrent users/players and stay stable for months.

One can achieve scalability in many ways, and if you expect me to provide you with a magical formula you will be disappointed. I actually believe that you can scale almost anything if you have the adequate resources. So saying that X or Y doesn’t scale is for me a sign that people are taking shortcuts in their explanations (X or Y are really hard to scale so they don’t scale) or that they don’t understand the causality of scaling. However what I am exploring in this post is the relationship between cause and effect when trying to make a system scalable. We will see that the scalability challenge is not new and not exclusive to the tech world. We will study the traditional approach to scaling and as well as the challenge of scaling in relation to the web and what to be aware of when planning to make a solution scalable.

Scaling outside of the tech world

Trying to scale isn’t new. It goes back to well before technology was invented. Scaling something up or increasing something in size or number is a goal businesses have aimed for ever since the oldest profession in the world was invented. A prostitute wanting to scale up her business was limited by her own time and body. She would reach a point where she couldn’t take more clients. (Independent contractors surely know what I am talking about!) So a prostitute wanting to scale up would usually become a madam/Mama-san and scale the business by having girls work for her.

Another simple example would be a restaurant. A restaurant can handle up to a certain amount of covers/clients at once, after that, customers have to wait in line. The restaurant example is interesting because you can clearly see that opening a huge restaurant with a capacity of 1,000 covers might not be a good idea. First because the cost of running such a restaurant might be much more than the income generated. But also because even though the restaurant does 1,000 covers at peak time, it doesn’t mean that the restaurant will stay that busy during the entire time it’s open. So now you have to deal with waiter/waitresses, busboys and other staff who won’t have anything to do. As you probably have understood already, scaling a restaurant means that the scaling has to be done in a cost effective manner.  And what’s even more interesting is that what we could have thought was the bottleneck (the amount of concurrent covers) can be easily scaled up but it wouldn’t provide real scalability. In fact this choice would cascade into other areas of management like staffing and the building size. Often, the scaling solution for restaurants is to open new locations which can result in keeping the lines shorter, targeting new markets and reducing risks since one failing branch won’t dramatically affect the others.

posted by Matt Aimonetti

Scaling in the traditional tech world

If you’ve ever done console development or worked on embedded devices, you know that they are restricted by some key elements. posted by Matt AimonettiIt can be memory, CPU, hard drive space etc… You have to “cram” as many features as you can into the device, working around the fixed limitations of the hardware. In the console industry, what’s interesting to note is that the hardware doesn’t change often but people expect than a new game on the same platform will do things better than the previous game, even though the limitations are exactly the same. This is quite a challenging problem because you have to fight against the hardware limitations by optimizing your code to be super efficient. That’s exactly the reason why console video game developers manage memory manually instead of relying on a garbage collector. This way they can squeeze every resource they can from the console.

The great advantage of this type of development is that you can reproduce and accurately anticipate issues. The bottlenecks/limitations are well known and immutable! If you find a way around in your lab, you know that the solution will work for everyone. Console video game developers (and to some extent, iOS developers) don’t have to wonder how their game will behave if the player has an old graphic card or not enough RAM.

But ever since we started distributing the processing power, scaling technology has become more challenging.

Scaling on the web

Scaling a web based solution might actually seem quite like scaling a restaurant, except that you can’t easily open multiple locations since the concept of proximity in web browsing isn’t really as concrete as in real life. So the solution can’t be directly transposed. Most people will only have to scale up by optimizing their code running on one server, or maybe two. That’s because their service/app is not, and won’t be, generating high traffic. Scaling such systems is common and one can rely on work done in the past decades for good examples of solutions.

However some web apps/games are or will become high traffic. But because every single entrepreneur I’ve met believes that their solution will be high traffic, they think they need to be able to scale and therefore should be engineered like that from the beginning. (This is, by the way, the reason scalability is a buzzword and you can sell almost anything technical saying that it scales.) The problem with this approach is that people want scalability but don’t understand its causality. In other words, they don’t understand the relationship between cause and effect related to making a solution scalable.

Basically we can reduce the concept of causality of scalability to something like this: you change a piece of the architecture to handle more traffic, but this part has an effect on other parts that also need to change and the pursuit of scalability almost never ends (just ask Google). Making a system scalable needs to have well a defined cause and expected effect, otherwise it’s a waste. In other words, the effect of scaling engenders the need for solutions which themselves have complex effects on a lot of aspects of a system. Let’s make it clearer by looking at a simple example:

Simple Architecture by Matt AimonettiWe have an e-commerce website and this website uses a web application with a database to store products and transactions. Your system is made of 1 webserver handling the requests and one database storing the data. Everything goes well until Black Friday, Christmas or Mother’s Day arrives and now some customers are complaining that they can’t access your website or that it’s too slow. This is also sometimes referred to as the digg/slashdot/reddit effect. All of a sudden you have a peak of traffic and your website can’t handle it. This is actually a very simple use case, but that’s also the only use case most people on the web need to worry about.

The causality of wanting this solution to scale is simple, you want to scale so you can sell more and have happy customers. The effect is that the system needs to become more complex.

To scale such a system, you need to find the root cause of the problem. You might have a few issues, but start by focusing on the main one. In this case, it’s more than likely that your webserver (frontend) cannot handle more than x requests/second. Interestingly enough, the amount of reqs/s might not match the result of your load tests. That’s probably because you didn’t expect the usage pattern that you are seeing, but that’s a whole different topic. At this point you need to understand why you can’t go above the x reqs/s limit you’re hitting. Where is the bottleneck? Is it that your application code is too slow? Is it the database has been brought to its knees? Or maybe the webserver serves as many requests as technically possible but it’s still not enough based on the traffic you are getting.

If we stop right here, we can see that the reasons why the solution doesn’t scale can be multiple. But what’s even more interesting is that the root cause this time depends on the usage pattern and that it is really hard to anticipate all patterns. If we wanted to make this system scale we could do it different ways.

To give you some canned answers, if the bottleneck is that your code is too slow, you should check if the code is slow because of the DB queries made (too many, slow queries etc..). Is it slow because you are doing something complex that can’t be easily improved or is it because you are relying on solutions that are known to not support concurrent traffic easily? More than likely, you will end up going for the easy caching approach. By caching some data (full responses, chunk of data, partial responses etc..) you avoid hitting your application layer and therefore can handle more traffic.

More complex Architecture by Matt Aimonetti

Caching avoids data processing & DB access

If your code is as fast as it can be, then a solution is to add more application servers or to async some processes. But now that means that you need to change the topology of your system, the way you deploy code and the way you route traffic. You will also increase the load on the database by opening more connections and maybe the database will now becoming the new bottleneck. You might also start seeing race conditions and you are certainly increasing the maintenance and complexity aka cost of your system (caching might end up having the same effect depending on the caching solution chosen).

load balanced approach by Matt Aimonetti

One way of scaling it to load balance the traffic

Just looking at these possible causes and the various solutions (we didn’t even mention DB replication, sharding, NoSQL etc..), we can clearly see that making a system scalable has some concrete effects on system complexity/maintenance which directly translate in cost increase.

If you are an engineer, you obviously want your system to be super scalable and handle millions of requests per second. But if you are a business person, you want to be realistic and evaluate the causality of not scaling after a certain point and convert that as loss. Then you weigh the cost of not scaling with the cost of “maybe” scaling and you make a decision.

The problem here though is that scaling is a bit like another buzzword: SEO (Search Engine Optimization). A lot of people/solutions will promise scaling capabilities without really understanding the big picture. Simple systems can easily scale up using simple solutions but only up to a certain level. After that, what you need to do to scale becomes so complex than anyone promising you the moon probably doesn’t know what they are talking about. If there was a one-size-fits-all, easy solution for scaling, we would all be using it, from your brother for his blog, to Google without forgetting Amazon.

AWS logo posted by Matt AimonettiSpeaking of Amazon, I hear a lot of people saying that Amazon AWS services is “THE WAY” (i.e: the only way) to scale your applications. I agree that it’s a compelling solution for a lot of cases but it’s far from being a silver bullet. Remember that the cause and effect of why you need to scale are probably different than anyone else.

Amazon Web Services

Let me give you a very concrete example of where AWS services might not be a good idea: high traffic sites with lots of database writes and low latency.

Zynga, the famous social game company behind Farmville, Mafia Wars etc., is using AWS and it seems that they might have found themselves in the same scenario as above. And that would be almost correct. Zynga games have huge traffic and they do a ton of DB writes. However I don’t think they need low latency since their game clients are browsers and Flash clients and that their games are mainly async so they just need to be able to handle unstable latency. We’ll see in a second how they manage to perform on the AWS cloud.

The major problem with AWS when you have a high traffic site is IO: IO reliability, IO latency, IO availability. By IO, I’m referring to network connection (internal/external) and disk access. Put differently, when you design your system and you know you are going to run on AWS, you need to take into consideration that your solution should survive with zero or limited IO because you will more than likely be IO bound. This means that your traditional design won’t work because your database hard drive won’t be available for 30s or will be totally saturated. You also need to have a super redundant system because you are going to randomly lose machines. Point number one, moving your existing application from a dedicated hosting solution to AWS might not help you scale if you didn’t architect to be resilient to bad IO. Simply put, and to only pick one example: if you were expecting your database to be able to always properly write to disk you will have problems.

Octocat, the GitHub mascot

The solution depends on how you want to look at it and where you are at in your project. You can go the Zynga route and design/redesign your entire architecture to be highly redundant, not rely on disk access (everything is kept in memory and flushed to disk when available) and tolerate a certain % of data loss. Or you can go with the GitHub approach and mix dedicated hardware for IO and “cloud” front end servers all on the same network. One solution isn’t better than the other, they are just different and depend on your needs. GitHub and Zynga both need to scale but they have different requirements.

When it comes to scaling, things are not black or white. To stay on the AWS topic, let’s take another example: Amazon Relational Database Service (RDS). Earlier today, I was complaining on Twitter that RDS doesn’t and probably won’t let you use the MySQL HandlerSocket plugin any time soon, even though it’s been released for almost 6 months and used in prod by many. Then someone asked me if using this plugin would offset the scalability cost-saving. The quick and wrong answer  is yes. By using the plugin, you can potentially get rid of your Memcached servers, probably your Redis/MongoDB/CouchDB servers or whatever NoSQL solution you write and just keep the database servers you currently have. You might have to beef up your DB servers a bit but it would certainly be a huge cost reduction and your system would be simpler, easier to maintain and the data would be more consistent. Sounds good right? After all the biggest online social game company designed it and uses it.

The only problem is that RDS is an AWS service and like every AWS service, it suffers from poor IO. So, if you were deciding to not use RDS and run your own MySQL servers with the HandlerSocket plugin, it wouldn’t bring you much improvement (1). Actually, if you are already IO bound, it would make things worse, because you are centralizing your system around the most unreliable part of your architecture. Based on that premise, RDS won’t support HandlerSocket because RDS runs on the same AWS architecture and has to deal with the same IO constraints. What’s the solution, you might ask? Amazon already went through these scaling problems and they offer a custom, non-relational, data storage solution working around their own issues called SimpleDB. But why would they improve RDS and fix a really hard problem when they already offer an alternative solution? Easy. SimpleDB forces you to redesign your architecture to work with their custom solution and, guess what? You are now locked-in to that vendor!

So the answer is yes, you can offset scalability costs if you don’t use AWS or any other providers with bad IO. Now you should look at the cost of moving away from AWS and see if it’s worth it. How much of your code and of your system is vendor specific? Is that something you can easily change? The fog library, for instance, supports multiple cloud providers. Are you using something similar? Can you transition to that?  Can you easily deploy to another hosting company? (Opscode chef makes that task much easier) But if, for one reason or another, you have to stick with AWS/<other cloud provider>, make sure that the business people in charge understand the consequences and the cost related to that choice.


My point is not to tell you to not design a scalable solution, or not to use AWS, or that RDS sucks. My point is to show that making a system scale is hard and has some drastic effects that are not always obvious. There aren’t any silver bullet solutions and you need to be really careful about the consequences (and costs) involved with trying to scale. Make sure it’s worth it and you have a plan. Define measurable goals for your scalability even though it’s really hard, don’t try to scale to infinity and beyond, that won’t work. Having to redesign later on to handle even more traffic, is a good problem to have, don’t over engineer.

Finally, be careful to understand the consequences of your decisions. What seems to be an almost trivial scaling move such as moving your app from dedicated hosting to a specific cloud provider might end up getting you in a vendor lock in situation!

1: I assume that you are IO bound. If you are not and your DB data fits in memory/cache, then HS on AWS is fine but if that’s the case what’s your bottleneck? ;)

, , ,