Archive for July, 2009
Over the weekend, I spent some time working on a Ruby + Rack +CouchDB project. Three technologies that I know quite well but that I never put to work together at the same time, at least not directly. Let’s call this Part I.
Before we get started, let me introduce each component:
- Ruby : if you are reading this blog, you more than likely know at least a little bit about, what I consider, one of the most enjoyable programming language out there. It’s also a very flexible language that lets us do some interesting things. I could have chosen Python to do the same project but that’s a whole different topic. For this project we will do something Ruby excels at: reopening existing classes and injecting more code.
- Rack: a webserver interface written in Ruby and inspired by Python’s WSGI. Basically, it’s a defined API to interact between webservers and web frameworks. It’s used by most common Ruby web frameworks, from Sinatra to Rails (btw, Rails3 is going to be even more Rack-focused than it already is). So, very simply put, the webserver receives a request, passes it to Rack, that converts it, passes it to your web framework and the web framework sends a response in the expected format (more on Rack later).
- CouchDB: Apache’s document-oriented database. RESTful API, schema-less, written in Erlang with built-in support for map/reduce. For this project, I’m using CouchRest, a Ruby wrapper for Couch.
Goal: Log Couch requests and analyze data
Let’s say we have a Rails, Sinatra or Merb application and we are using CouchRest (maybe we are using CouchRest and ActiveRecord, but let’s ignore that for now).
Everything works fine but we would like to profile our app a little and maybe optimize the DB usage. The default framework loggers don’t support Couch. The easy way would be to tail the Couch logs or look at the logs in CouchDBX. Now, while that works, we can’t really see what DB calls are made per action, so it makes any optimization work a bit tedious. (Note that Rails3 will have some better conventions for logging, making things even easier)
So, let’s see how to fix that. Let’s start by looking at Rack.
Instead of hacking a web framework specific solution, let’s use Rack. Rack is dead simple, you just need to write a class that has a call method.
In our case, we don’t care about modifying the response, we just want to instrument our app. We just want our middleware to be transparent and let our webserver deal with it normally.
Here we go … that wasn’t hard, was it? We keep the application reference in the @app variable when a new instance of the middleware is created. Then when the middleware is called, we just call the rest of the chain and pretend nothing happened.
As you can see, we just added some logging info around the request. Let’s do one better and save the logs in CouchDB:
Again, nothing complicated. In our rackup file we defined which Couch database to use and we passed it to our middleware (we change our initialize method signature to take the DB).
Finally, instead of printing out the logs, we are saving them to the database.
W00t! At this point all our requests have been saved in the DB with all the data there, ready to be manipulated by some map/reduce views we will write. For the record, you might want to use the bulk_save approach in CouchDB which will wait for X amount of records to save them in the DB all at once. Couch also let’s you send new documents, but only save it to the DB every X documents or X seconds.
As you can see, our document contains the timestamps and the full environment as a hash.
All of that is nice, but even though we get a lot of information, we could not actually see any of the DB calls made in each request. Let’s fix that and inject our logger in CouchRest (you could apply the same approach to any adapter).
Let’s reopen the HTTP Abstraction layer class used by CouchRest and inject some instrumentation:
Again, nothing fancy, we are just opening the module, reopening the methods and wrapping our code around the super call (for those who don’t know, super calls the original method).
This is all for Part I. In Part II, we’ll see how to process the logs and make all that data useful.
By the way, if you make it to RailsSummit, I will be giving a talk on Rails3 and the new exciting stuff you will be able to do including Rack based stuff, CouchDB, MongoDB, new DataMapper etc..
CouchDB is an awesome technology. I’m lucky enough to work on quite a big project where we decided to switch from MySQL to Couch for various reasons.
One of the many things I like with Couch is that it handles attachments and can replicate them as well as serve them for you using the Erlang based builtin webserver. (you can load balance your dbs and do some other really cool stuff)
Let’s take a use case. Let’s imagine that you have a web app with logged in users. Every user can have their own avatar.
No big deal, you get the user to upload his/her avatar to your app and add it to the user document in the database. To serve it from the database, you just need to create a proxy in nginx/apache and redirect the virtual avatar url to the protected DB making sure the request is a GET request.
Now, the problem is when you want to serve authorized attachments. Let’s imagine that we want to let our users upload private files, files that should be accessible only by the owner or users designated by the owner.
In this case, a simple nginx rewrite wouldn’t work. We need to authorize attachment requests. Here is a cool way of doing that using nginx and merb’s router. (Expect Rails3 router to do the same).
Let’s start by setting up nginx and create a proxy for couchdb:
Now that this is done, we are going to use Merb’s awesome router to handle the incoming requests. The cool part of this is that we won’t be dispatching requests so, going through the router is almost free. (check on the Merb router benchmarks for more info). Let’s edit our router and set a special route for our assets.
We are using a deferred route which gets executed instead of dispatching the request.
If the attachment route is being matched then we are checking what environment we are currently running in. If we are in production or staging environment then we are sending back a rack response to the webserver. The response is just a forward to the proper couchdb document behind the proxy. Of course, before allowing that to happen, we could authenticate the logged in user, log the request and do a couple of other things. You have full access to your models from the router, so authenticating a session isn’t a big deal. You could even create temporarily urls like AWS s3 does.
If we are not in production or staging mode, then just redirect the request to couch since we assume you have access to the local db. This way, your asset urls will be working in production and dev. In real life, you’ll want to apply the authorization before choosing how to deliver the document/attachment tho as you want it work the same way in development and production.
To celebrate the amazing work being done by Laurent Sansonetti on MacRuby here is a hello world using the new LLVM based compiler.
$ echo "p ARGV.join(' ').upcase" > hello_world.rb
$ macrubyc hello_world.rb -o macruby_says
$ ./macruby_says hello world
Note that to achieve this result, you need to be using the experimental branch of MacRuby and have LLVM installed. (check the readme available in MacRuby’s repo).
Let’s quickly look at what we just did.
We created a single line ruby script that takes the passed arguments, joins them and print them uppercase.
Then, we compiled our script into a Mach-O object file and produce an executable.
Here is an extract from Laurent’s latest status report:
Produced executables embed all the compiled Ruby code as well as MacRuby, statically.
It can be distributed as is and does not depend on anything MacRuby or LLVM at runtime.
The Ruby source code is compiled into native machine code (same process as we do at runtime with the JIT compiler), so it’s also a good way to obfuscate the source code.
The final binary looks like an Objective-C binary (except that it’s larger)
Don’t expect to compile Rails just yet, it’s still in a preliminary stage.
The final release you should let you pick one of the 2 compilation modes:
* normal mode: full ruby specs, compile down to machine code and use LLVM at runtime. (recommended for desktop/server apps)
* full mode: no full ruby spec support, no runtime code generation, no LLVM. (“very light application and/or if the environment does not support runtime code generation” (hint-hint))
As you can see, MacRuby is moving forward and the experimental branch should soon move to trunk.