Archive for category Misc
Books to read in 2012 – recommended to me by Twitter
Posted by Matt Aimonetti in Misc on December 30th, 2011
Today, I asked on Twitter what non-technical books I should read in 2012.
I was nicely surprised to see so many of my followers send recommendations. Here is a list of 25 books that like-minded people suggested I read. Hopefully you will find a book or two to read too. Feel free to send more recommendations via the comments.
Developing a Curriculum
Posted by Matt Aimonetti in Misc on December 21st, 2011
Recently I asked a friend of mine to give me pointers on how to develop a curriculum (he used to teach an education PHD program), after discussing his response on Twitter, people asked me to put it somewhere, so here it is:
Process to develop a curriculum:
Purpose. Know why you’re doing what you’re doing.
- You know how to do this.
Product. Start with the end in mind.
- What does the student look like when they walk out the door at the end of the training.
- Usually, we break these down into Knowledge, Skills, or Attitudes.
- Sometimes it’s helpful to see a photograph or drawing of a someone who finished the program and just talk about what they can do that makes them successful.
- This “product” should be connected and help you accomplish your mission
Practices. Then ask yourself, “How do people become like this?”
- If you can break down your Product into 3-5 bit-sized chunks, then see how people learn each one of those skills, gain each one of those knowledge points, and how to they gain the attitudes you want them to have.
- This one is much easier the more experience you have in seeing people develop the “Product.”
- This is also easier to determine when you understand Learning Theory.
- The results from this section will result in a list of:
- Activities or experiences
- Resources. What books, website, teachers, software, etc. will help them learn more effectively and efficiently
- Assessments. How you would know if the activity was helpful?
Plans. Make your plans based on the practices you’ve determined you’ve needed.
On a related topic, Chad Fowler posted an interesting blog post about what LivingSocial is doing to change the software development education.
Data safety and GIL removal
Posted by Matt Aimonetti in Misc, ruby on October 18th, 2011
After my recent RubyConf talk and follow up post addressing the Ruby & Python’s Global Interpreter Lock (aka GVL/Global VM Lock). a lot of people asked me to explain what I meant by “data safety”. While my point isn’t to defend one approach or the other, I spent a lot of time explaining why C Ruby and C Python use a GIL and where it matters and where it matters less. As a reminder and as mentioned by Matz himself, the main reason why C Ruby still has a GIL is data safety. But if this point isn’t clear to you, you might be missing the main argument supporting the use of a GIL.
Showing obvious concrete examples of data corruption due to unsafe threaded code isn’t actually as easy at it sounds. First of all, even with a GIL, developers can write unsafe threaded code. So we need to focus only on the safety problems raised by removing the GIL. To demonstrate what I mean, I will try to create some race conditions and show you the unexpected results you might get. Again, before you go crazy on the comments, remember that threaded code is indeterministic and the code below might potentially work on your machine and that’s exactly why it is hard to demonstrate. Race conditions depend on many things, but in this case I will focus on race conditions affecting basic data structures since it might be the most surprising.
Example:
@array, threads = [], [] 4.times do threads << Thread.new { (1..100_000).each {|n| @array << n} } end threads.each{|t| t.join } puts @array.size
In the above example, I’m creating an instance variable of Array type and I start 4 threads. Each of these threads adds 100,000 items to the array. We then wait for all the threads to be done and check the size of the array.
If you run this code in C Ruby the end result will be as expected:
400000Now if you switch to JRuby you might be surprised by the output. If you are lucky you will see the following:
ConcurrencyError: Detected invalid array contents due to unsynchronized modifications with concurrent users
<< at org/jruby/RubyArray.java:1147
__file__ at demo.rb:3
each at org/jruby/RubyRange.java:407
__file__ at demo.rb:3
call at org/jruby/RubyProc.java:274
call at org/jruby/RubyProc.java:233
This is actually a good thing. JRuby detects that you are unsafely modifying an instance variable across threads and that data corruption will occur. However, the exception doesn’t always get raised and you will potentially see results such as:
335467 342397 341080
This is a sign that the data was corrupted but that JRuby didn’t catch the unsynchronized modification. On the other hand MacRuby and Rubinius 2 (dev) won’t raise any exceptions and will just corrupt the data, outputting something like:
294278 285755 280704 279865
In other words, if not manually synchronized, shared data can easily be corrupted. You might have two threads modifying the value of the same variable and one of the two threads will step on top of the other leaving you with a race condition. You only need 2 threads accessing the same instance variable at the same time to get a race condition. My example uses more threads and more mutations to make the problem more obvious. Note that TDD wouldn’t catch such an issue and even extensive testing will provide very little guarantee that your code is thread safe.
So what? Thread safety isn’t a new problem.
That’s absolutely correct, ask any decent Java developer out there, he/she will tell how locks are used to “easily” synchronize objects to make your code thread safe. They might also mention the deadlocks and other issues related to that, but that’s a different story. One might also argue that when you write web apps, there is very little shared data and the chances of corrupting data across concurrent requests is very small since most of the data is kept in a shared data store outside of the process.
All these arguments are absolutely valid, the challenge is that you have a large community and a large amount of code out there that expects a certain behavior. And removing the GIL does change this behavior. It might not be a big deal for you because you know how to deal with thread safety, but it might be a big deal for others and C Ruby is by far the most used Ruby implementation. It’s basically like saying that automatic cars shouldn’t be made and sold, and everybody has to switch to stick shifts. They have better gas mileage, I personally enjoy driving then and they are cheaper to build. Removing the GIL is a bit like that. There is a cost associated with this decision and while this cost isn’t insane, the people in charge prefer to not pay it.
Screw that, I’ll switch to Node.js
I heard a lot of people telling me they were looking into using Node.js because it has a better design and no GIL. While I like Node.js and if I were to implement a chat room or an app keeping connections for a long time, I would certainly compare it closely to EventMachine, I also think that this argument related to the GIL is absurd. First, you have other Ruby implementations which don’t have a GIL and are really stable (i.e: JRuby) but then Node basically works the same as Ruby with a GIL. Yes, Node is evented and single threaded but when you think about it, it behaves the same as Ruby 1.9 with its GIL. Many requests come in and they are handled one after the other and because IO requests are non-blocking, multiple requests can be processed concurrently but not in parallel. Well folks, that’s exactly how C Ruby works too, and unlike popular believe, most if not all the popular libraries making IO requests are non blocking (when using 1.9). So, next time you try to justify you wanting to toy with Node, please don’t use the GIL argument.
What should I do?
As always, evaluate your needs and see what makes sense for your project. Start by making sure you are using Ruby 1.9 and your code makes good use of threading. Then look at your app and how it behaves, is it CPU-bound or IO-bound. Most web apps out there are IO-bound (waiting for the DB, redis or API calls), and when doing an IO call, Ruby’s GIL is released allowing another thread to do its work. In that case, not having a GIL in your Ruby implementation won’t help you. However, if your app is CPU-bound, then switching to JRuby or Rubinius might be beneficial. However, don’t assume anything until you proved it and remember that making such a change will more than likely require some architectural redesign, especially if using JRuby. But, hey, it might totally be worth it as many proved it in the past.
I hope I was able to clarify things a bit further. If you wish to dig further, I would highly recommend you read the many discussions the Python community had in the last few years.
About management
Posted by Matt Aimonetti in Misc on October 11th, 2011
I decided to save myself a session to the shrink and instead just write down my reflection on management. Who knows, some of you might help me and/or challenge my thought process.
I recently read a great management book called the five dysfunctions of a team by Patrick Lencioni . Instead of telling you what to do, the author highlights behavior patterns that are related to each other and when aggregated result in dysfunctional teams. I really liked the book because instead of a being a cookbook/playbook, this is more a fail book, in other words, it illustrates what you don’t want to do and explains why. It highlights very well the relation between various behaviors and nicely illustrates why teams of brilliant people can fail. The Kindle version is at less than $5, go get it and read it on your iPhone/iPad/computer/browser…
So this book somewhat changed my perception of management and leadership. Interesting enough, at Sony, my previous employer, they make a distinction between management and leadership. While they hope managers can be leaders, they don’t require them to be and to be honest very few are. I’m not sure that’s a good or a bad things, but I, for sure, was under different expectations. Finally, I spent a large amount of my life on the internet working on/with projects where meritocracy, respect and honor were key. The “ranking” is purely based on what your peers think of you and not based on your age/sex/origin/diploma/bank account. I do realize that this model has many pros but also some pretty major cons. My only point is that it did affect my worldview. In my world, seniority, a killer job title or a fancy suit won’t buy you my automatic respect. On the other hand, job well done, great vision, honesty, over achievement will!
Taking these few trains of thoughts in consideration, I started thinking about my own expectations for a good manager/leader. I figured that if I were able to do that, I could possibly be able to define a work environment where I could thrive and maybe one day become a good “manager/leader”.
I’ve always questioned my ability to be a good leader. While most of the time, I have an opinion and can easily decide what I think should be done, I have a hard time relating to people who can’t see the “big picture”. While I usually can get decent results, I’m aware that it can unfortunately sometime be at the cost of a few bruised egos. I also know I have high expectations for myself and for others and I have a hard time understanding how some people can be ok with the “status-quo”. I’m a perfectionist who is only happy when he outperforms his previous achievement. I was raised to challenge and always push myself further, focusing on concrete end-results and achieved goals. And to be honest, that’s what I enjoy. But I also know for a fact, that many people are not like that and I can’t blame them for looking at things from a different angle and not sharing the same motivations. Furthermore, I know that most people actually don’t have the same driven temperament and that’s why I’ve questioned my abilities to lead others.
However, different temperaments can work together as long as there is respect. And by respect, I mean that everyone feel that they were being heard and know that their input was considered and addressed even though the outcome might not be as hoped for. But for respect to happen, you first need trust. And when people trust each other, Lencioni explains that “people don’t hold back one with another. They are unafraid to air their dirty laundry. They admit their mistakes, their weaknesses, and their concerns without fear of reprisal”. I think that as simple as it seems, it is the key to a successful team. A good leader should be able to create such an atmosphere where people can trust each other. In fact, I think that if a manger/leader/executive can manage to build trust as defined earlier, his technical skills or lack of vision don’t matter as much. He/she will be able to rely on people he trusts to help him make the right decisions. Of course, there is much more than to be a good leader, but I think that with this base, great things can be built, and without it, a much greater effort is required to get some good results.
Based on my findings, I think that I need to work on my communication so others don’t feel that they have to hold back and make sure everyone feels that their opinions were considered and addressed. To do that a key element is to admit my mistakes and weaknesses and asking others to help me improve. That’s it, sorry for the boring, not technical post. I promise the next one will have at least a code sample.
How to – cross domain ajax to a Ruby app
Posted by Matt Aimonetti in Misc, ruby on September 14th, 2011
In some cases, you might have a bunch of apps running on different domains/subdomains and/or ports and you would like to make ajax requests between these services. The problem is that browsers wouldn’t let you make such requests because of the Same Origin Policy which only allowed them to make request to resources within the same domain.
However, most browsers (IE 8+, Firefox 3.5+, Safari 4+, Chrome) implement a simple way to allow cross domain requests as defined in this w3C document.
Of course, if your users have an old version of their browser, you might have to look into jsonp or something else such as cheating by using iframes & setting document.domain. Let’s pretend for a minute that 100% of your users are on Chrome. The only thing you need to do is set a response header listing the accepted domains or “*” for all. A simple Rack middleware to do that would look like that.
class XOriginEnabler ORIGIN_HEADER = "Access-Control-Allow-Origin" def initialize(app, accepted_domain="*") @app = app @accepted_domain = accepted_domain end def call(env) status, header, body = @app.call(env) header[ORIGIN_HEADER] = @accepted_domain [status, header, body] end end
And to use the middleware you would need to set it for use:
use XOriginEnabler
To enable all requests from whatever origin, or pass the white listed domain(s) as shown below.
use XOriginEnabler, "demo.mysite.com demo.mysite.fr demo.techcrunch.com"For a full featured middleware, see this project.
Ruby optimization example and explanation
Posted by Matt Aimonetti in Misc, ruby on September 5th, 2011
Recently I wrote a small DSL that allows the user to define some code that then gets executed later on and in different contexts. Imagine something like Sinatra where each route action is defined in a block and then executed in context of an incoming request.
The challenge is that blocks come with their context and you can’t execute a block in the context of another one.
Here is a reduction of the challenge I was trying to solve:
class SolutionZero def initialize(origin, &block) @origin = origin @block = block end def dispatch @block.call end end SolutionZero.new(42){ @origin + 1 }.dispatch # undefined method `+' for nil:NilClass (NoMethodError)
The problem is that the block refers to the @origin instance variable which is not available in its context.
My first workaround was to use instance_eval:
class SolutionOne def initialize(origin, &block) @origin = origin @block = block end def dispatch self.instance_eval &@block end end SolutionOne.new(40){ @origin + 2}.dispatch # 42
My workaround worked fine, since the block was evaluated in the context of the instance and therefore the @origin ivar is made available to block context. Technically, I was good to go, but I wasn’t really pleased with this solution. First using instance_eval often an indication that you are trying to take a shortcut. Then having to convert my block stored as a block back into a proc every single dispatch makes me sad. Finally, I think that this code is probably not performing as well as it could, mainly due to unnecessary object allocations and code evaluation.
I did some benchmarks replacing instance_eval by instance_exec since looking at the C code, instance_exec should be slightly faster. Turns out, it is not so I probably missed something when reading the implementation code.
I wrote some more benchmarks and profiled a loop of 2 million dispatches (only the #disptach method call on the same object). The GC profiler report showed that the GC was invoked 287 times and each invocation was blocking the execution for about 0.15ms.
Using Ruby’s ObjectSpace and disabling the GC during the benchmark, I could see that each loop allocates an object of type T_NODE which is more than likely our @block ivar converted back into a block. This is quite a waste. Furthermore, having to evaluate our block in a different context every single call surely isn’t good for performance.
So instead of doing the work at run time, why not doing it at load time? By that I mean that we can optimize the #dispatch method if we could “precompile” the method body instead of “proxying” the dispatch to an instance_eval call. Here is the code:
class SolutionTwo def initialize(origin, &block) @origin = origin implementation(block) end private def implementation(block) mod = Module.new mod.send(:define_method, :dispatch, block) self.extend mod end end SolutionTwo.new(40){ @origin + 2}.dispatch # 42
This optimization is based on the fact that the benchmark (and the real life usage) creates the instance once and then calls #dispatch many times. So by making the initialization of our instance a bit slower, we can drastically improve the performance of the method call. We also still need to execute our block in the right context. And finally, each instance might have a different way to dispatch since it is defined dynamically at initialization. To work around all these issues, we create a new module on which we define a new method called dispatch and the body of this method is the passed block. Then we simply our instance using our new module.
Now every time we call #dispatch, a real method is dispatched which is much faster than doing an eval and no objects are allocated. Running the profiler and the benchmarks script used earlier, we can confirm that the GC doesn’t run a single time and that the optimized code runs 2X faster!
Once again, it’s yet another example showing that you should care about object allocation when dealing with code in the critical path. It also shows how to work around the block bindings. Now, it doesn’t mean that you have to obsess about object allocation and performance, even if my last implementation is 2X faster than the previous, we are only talking about a few microseconds per dispatch. That said microseconds do add up and creating too many objects will slow down even your faster code since the GC will stop-the-world as its cleaning up your memory. In real life, you probably don’t have to worry too much about low level details like that, unless you are working on a framework or sharing your code with others. But at least you can learn and understand why one approach is faster than the other, it might not be useful to you right away, but if you take programming as a craft, it’s good to understand how things work under the hood so you can make educated decisions.
Update:
@apeiros in the comments suggested a solution that works & performs the same as my solution, but is much cleaner:
class SolutionTwo def initialize(origin, &block) @origin = origin define_singleton_method(:dispatch, block) if block_given? end end
First step in scaling a web site: HTTP caching
Posted by Matt Aimonetti in JavaScript, Misc, Software Design on July 11th, 2011
Today my friend Patrick Crowley and I were talking about scaling his website: http://cinematreasures.org since an article covering his work will soon be published in a very popular newspaper. Patrick’s site is hosted on Heroku which comes by default with Varnish caching enabled.
The challenge is that a lot of people using the Rails framework are used to doing page caching instead of relying on HTTP caching, even though this feature was added a long time ago. The major problem with page caching is that it doesn’t scale that well as soon as you run more than one server. Indeed you would need to store the page content to a shared drive between your servers or use memcached and do some work to avoid hitting your app every single time. On the other hand, HTTP caching is extremely easy to handle at the application level and it will dramatically reduce the amount of requests hitting your app. Let me explain a little more about HTTP caching.
Ryan Tomako wrote an excellent post about the details of caching, I strongly recommend you read it. In a nutshell, the HTTP caching layer (usually) seats before your application layer and allows you, the developer to store some responses that can be send back to the users based on optional conditions. That might still seem vague, let’s take a concrete example. If you look at http://cinematreasures.org‘s home page you can see that it’s an agglomerate of various information:
And the bottom of the page contains even more dynamic data such as the popular movie theater photos, latest movie theater videos and latest tweets. One might look at that and say that this page can’t really be cached and that the caching should be done at the model layer (i.e. cache the data coming from the database). I would certainly agree that caching the data layer is probably a good idea, but you shouldn’t start by that. In fact without caching, this page renders fast enough. The problem is when someone like Roger Ebert tweets about CinemaTreasures the load on the app peaks significantly. At the point, the amount of concurrent connections your app can handle gets put to the challenge. Even though your page load is “fast enough”, requests will queue up and some will eventually time out. That’s actually a perfect case of HTTP caching.
What we want to do in that case is to cache a version of the home page in Varnish for 60 seconds. During that time, all requests coming to the site, will be served by Varnish and will all get the same cached content. That allows our servers to handle the non cached requests and therefore increase our throughput. What’s even better, is that if a user refreshes the home page in his/her browser during the first 60 seconds the requests won’t even make it all the way to our servers. All of that thanks to conditions set on the response. The first user hitting the HTTP cache layer (Varnish in this case) won’t find a fresh cached response, so varnish will forward the request to our application layer which will send back the homepage to varnish and tell Varnish that this content is good for a full minute so please don’t ask for it again until a minute from now. Varnish serves this response to the users’ browser and let the browser know that the server said that the response was good enough for a minute so don’t bother asking for it again. But now, if during these 60 seconds another user comes in, he will hit Varnish and Varnish will have the cached response from the first user and because the cache is still fresh (it’s not been 60 seconds since the first request) and the cache is public, then the same response will be sent to the second user.
As you can see, the real strength of HTTP caching is the fact that it’s a conditional caching. It’s based on the request’s URL and some “flags” set in the request/response headers.
Setting these conditions in your app is actually very simple since you just need to set the response’s headers. If you are using a Ruby framework you will more than likely have access to the request object via the “request” method and you can set the headers directly like that: “response.headers['Cache-Control'] = ‘public, max-age=60′”.
In Rails, you can actually use a helper method instead: expires_in 1.minute, :public => true.
You might have a case where you HAVE TO serve fresh content if available and can’t serve stale cached content even for a few seconds. In this case, you can rely on the Etag header value. The Etag is meant to validate the freshness of a cached response. Think of it as a signature (unique ID) that is set on the response and used by the client (or cache layer) to see if the server response has changed or not. The way it works is that the client keeps track of the Etag received for each request (attached to the cached response) and then sends it with the next requests. The HTTP layer or application sees the Etag in the request and can check if it is still valid and the content didn’t change. If that’s the case, an empty response can be sent with a special HTTP status code (304) to let know the client that the old cached value is still good to be used. Rails has a helper called “stale?” that helps you do the Etag/last modified check and allows you to not fetch all the objects from the database by doing a cheap check on an attribute (For instance you can check the updated_at value and use that as a condition to pull an object and its relationships).
So I explain HTTP caching, I often hear people telling me: “that’s great Matt, but you know what, that won’t work for us because we have custom content that we display specifically to our users”. So in that case, you can always set the Cache-Control header to private which will only cache the response in the client’s browser and not the cache layer. That’s good to some extent, but it can definitely be improved by rethinking a bit your view layer. In most web apps, the page content is rendered by server side code (Rails, Django, node.js, PHP..) and sent to the user all prepared for him. There are a few challenges with this approach, the biggest one is that the server has to wait until everything is ready (all data fetched, view rendered etc…) before sending back a response and before the client’s browser can start rendering (there are ways to chunk the response but that’s besides the scope of this post). The other is that the same expensive content has to be calculated/rendered for two different users because you might be inserting the username of the current user at the top of the page for instance. A classic way to deal with that is often to use fragment caching, where the expensive rendering is cached and reused by different requests. That’s good but if the only reason to do that is because we are displaying some user specific data, there is a simpler way: async page rendering. The concept is extremely simple: remove all user specific content from the rendered page and then inject the user content in a second step once the page is displayed. The advantage is that now the full page can be cached in Varnish (or Squid or whatever you use for HTTP caching). To inject the user content, the easiest way is to use JavaScript.
Let’s stay on CinemaTreasures, when you’re logged in, the username is shown on the top of each page:

Once logged in, the username is displayed on all pages
The only things that differs from the page rendered when the user is not logged in and when he is, are these 2 links and an avatar. So let’s write some code to inject that after rendering the page.
In Rails, in the sessions controller or whatever code logs you in, you need to create a new cookie containing the username:
cookies[:username] = { :value => session[:username], :expires => 2.days.from_now, :domain => ".cinematreasures.org" }
As you can see, we don’t store the data in the session cookie and the data won’t be encrypted. You need to be careful that someone changing his cookie value can’t access data he/should shouldn’t. But that’s a different discussion. Now that the cookie is set, we can read it from JavaScript when the page is loaded.
document.observe("dom:loaded", function() { displayLoggedinUserLinks(); }); function readCookie(name) { var nameEQ = name + "="; var ca = document.cookie.split(';'); for(var i=0;i < ca.length;i++) { var c = ca[i]; while (c.charAt(0)==' ') c = c.substring(1,c.length); if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length,c.length); } return null; } function displayLoggedinUserLinks() { var username = readCookie('username'); var loginLink = $('login'); var logout = $('logout'); if (username == null){ loginLink.show(); logout.hide(); }else{ // user is logged in and we have his/her username loginLink.hide(); if(userGreetings){ userGreetings.update("<span id='username'>username</span>"); } logout.show(); showAvatar(username); }; return true; }
The code above doesn’t do much, once the DOM is loaded, the displayLoggedinUserLinks() function gets trigger. This function reads the cookie via the readCookie() function and if a username is found, the login link is hidden, the user name is displayed, as well as the logout link and the avatar. (You can also use a jQuery cookie plugin to handle the cookie, but this is an old example using Prototype, replace the code accordingly)
When the user logs out, we just need to delete the username cookie and the cached page will be rendered properly. In Rails, you would do delete the cookie like that: cookies.delete(‘username’).
Quite often you might even want to make an Ajax call to get some information such as the number of user messages or notifications. Using jQuery or whatever JS framework you fancy you can do that once the page is rendered. Here is an example, on this page, you can see the learderboards for MLB The Show. The leaderboards don’t change that often, especially the overall leaderboards so they can be cached for a little while, however the player’s presence can change anytime. The smart way to deal with that, would be to cache the leaderboards for a few seconds/minutes and make an ajax call to a presence service passing it a list of user ids collected from the DOM. The service called via Ajax could also be cached depending on the requirements.
Now there is one more problem that people using might encouter: flash notices. For those of you not familiar with Rails, flash notices are messages set in the controller and passed to the view via the session (at least last time I checked). The problem happens if I’m the home page isn’t cached anymore and I logged in which redirects me to the home page with a flash message like so:

The problem is that the message is part of the rendered page and now for 60 seconds, all people hitting the home page will get the same message. This is why you would want to write a helper that would put this message in a custom cookie that you’d pull JS and then delete once displayed. You could use a helper like that to set the cookie:
def flash_notice_cookie(msg, expiration=nil) cookies[:flash_notice] = { :value => msg, :expires => expiration || 1.minutes.from_now, :domain => ".cinematreasures.com" } end
And then add a function called when the DOM is ready which loads the message and injects it in the DOM. Once the cookie read, delete it so the message isn’t displayed again.
So there you have it, if you follow these few steps, you should be able to handle easily 10x more traffic without increasing hardware or making any type of crazy code change. Before you start looking into memcached, redis, cdns or whatever, consider HTTP caching and async DOM manipulation. Finally, note that if you can’t use Varnish or Squid, you can very easily setup Rack-Cache locally and share the cache via memcached. It’s also a great way to test locally.
Update: CinemaTreasures was updated to use HTTP caching as described above. The hosting cost is now half of what it used to be and the throughput is actually higher which offers a better protection against peak traffic.
External resources:

