Posts Tagged couchdb

Ruby, Rack and CouchDB = lots of awesomeness

Over the weekend, I spent some time working on a Ruby + Rack +CouchDB project. Three technologies that I know quite well but that I never put to work together at the same time, at least not directly.  Let’s call this Part I.

Before we get started, let me introduce each component:

  • Ruby : if you are reading this blog, you more than likely know at least a little bit about, what I consider, one of the most enjoyable programming language out there. It’s also a very flexible language that lets us do some interesting things. I could have chosen Python to do the same project but that’s a whole different topic. For this project we will do something Ruby excels at: reopening existing classes and injecting more code.
  • Rack: a webserver interface written in Ruby and inspired by Python’s WSGI. Basically, it’s a defined API to interact between webservers and web frameworks. It’s used by most common Ruby web frameworks, from Sinatra to Rails (btw, Rails3 is going to be even more Rack-focused than it already is). So, very simply put, the webserver receives a request, passes it to Rack, that converts it, passes it to your web framework and the web framework sends a response in the expected format (more on Rack later).
  • CouchDB: Apache’s document-oriented database. RESTful API, schema-less, written in Erlang with built-in support for map/reduce. For this project, I’m using CouchRest, a Ruby wrapper for Couch.

Goal: Log Couch requests and analyze data

Let’s say we have a Rails, Sinatra or Merb application and we are using CouchRest (maybe we are using CouchRest and ActiveRecord, but let’s ignore that for now).

Everything works fine but we would like to profile our app a little and maybe optimize the DB usage. The default framework loggers don’t support Couch. The easy way would be to tail the Couch logs or look at the logs in CouchDBX. Now, while that works, we can’t really see what DB calls are made per action, so it makes any optimization work a bit tedious. (Note that Rails3 will have some better conventions for logging, making things even easier)

So, let’s see how to fix that. Let’s start by looking at Rack.

Rack Middleware

Instead of hacking a web framework specific solution, let’s use Rack. Rack is dead simple, you just need to write a class that has a call method.
In our case, we don’t care about modifying the response, we just want to instrument our app. We just want our middleware to be transparent and let our webserver deal with it normally.

Here we go … that wasn’t hard, was it? We keep the application reference in the @app variable when a new instance of the middleware is created. Then when the middleware is called, we just call the rest of the chain and pretend nothing happened.

As you can see, we just added some logging info around the request. Let’s do one better and save the logs in CouchDB:

Again, nothing complicated. In our rackup file we defined which Couch database to use and we passed it to our middleware (we change our initialize method signature to take the DB).
Finally, instead of printing out the logs, we are saving them to the database.

W00t! At this point all our requests have been saved in the DB with all the data there, ready to be manipulated by some map/reduce views we will write. For the record, you might want to use the bulk_save approach in CouchDB which will wait for X amount of records to save them in the DB all at once. Couch also let’s you send new documents, but only save it to the DB every X documents or X seconds.

As you can see, our document contains the timestamps and the full environment as a hash.

All of that is nice, but even though we get a lot of information, we could not actually see any of the DB calls made in each request. Let’s fix that and inject our logger in CouchRest (you could apply the same approach to any adapter).

Let’s reopen the HTTP Abstraction layer class used by CouchRest and inject some instrumentation:

Again, nothing fancy, we are just opening the module, reopening the methods and wrapping our code around the super call (for those who don’t know, super calls the original method).

This is all for Part I. In Part II, we’ll see how to process the logs and make all that data useful.

By the way, if you make it to RailsSummit, I will be giving a talk on Rails3 and the new exciting stuff you will be able to do including Rack based stuff, CouchDB, MongoDB, new DataMapper etc..

, , , , ,


Ruby authentication of CouchDB requests

CouchDB is an awesome technology. I’m lucky enough to work on quite a big project where we decided to switch from MySQL to Couch for various reasons.

One of the many things I like with Couch is that it handles attachments and can replicate them as well as serve them for you using the Erlang based builtin webserver. (you can load balance your dbs and do some other really cool stuff)

Let’s take a use case. Let’s imagine that you have a web app with logged in users. Every user can have their own avatar.

No big deal, you get the user to upload his/her avatar to your app and add it to the user document in the database. To serve it from the database, you just need to create a proxy in nginx/apache and redirect the virtual avatar url to the protected DB making sure the request is a GET request.

Add to that a caching solution like varnish or memcached module for nginx and all your db goodies get cached and served by the cache (server/client) until they get modified.

Now, the problem is when you want to serve authorized attachments. Let’s imagine that we want to let our users upload private files, files that should be accessible only by the owner or users designated by the owner.

In this case, a simple nginx rewrite wouldn’t work. We need to authorize attachment requests. Here is a cool way of doing that using nginx and merb’s router. (Expect Rails3 router to do the same).

Let’s start by setting up nginx and create a proxy for couchdb:

Now that this is done, we are going to use Merb’s awesome router to handle the incoming requests. The cool part of this is that we won’t be dispatching requests so, going through the router is almost free. (check on the Merb router benchmarks for more info). Let’s edit our router and set a special route for our assets.

We are using a deferred route which gets executed instead of dispatching the request.

If the attachment route is being matched then we are checking what environment we are currently running in. If we are in production or staging environment then we are sending back a rack response to the webserver. The response is just a forward to the proper couchdb document behind the proxy. Of course, before allowing that to happen, we could authenticate the logged in user, log the request and do a couple of other things. You have full access to your models from the router, so authenticating a session isn’t a big deal. You could even create temporarily urls like AWS s3 does.

If we are not in production or staging mode, then just redirect the request to couch since we assume you have access to the local db. This way, your asset urls will be working in production and dev. In real life, you’ll want to apply the authorization before choosing how to deliver the document/attachment tho as you want it work the same way in development and production.

, , ,


CouchDB with CouchRest in 5 minutes

The other night, during our monthly SDRuby meetup, lots of people were very interested in learning more about CouchDB and Ruby. I tried to show what Couch was all about but I didn’t have time to show how to use CouchDB with Ruby.
Here is me trying to do that in 10 minutes or less. I’ll assume you don’t have CouchDB installed.

Install CouchDB, if you are on MacOSX, you are in luck, download and unzip the standalone package called CouchDBX.
That’s it you have couch ready to go, press play and play with the web interface.

Next, let’s write a quick script. Let’s say we want to write a script that manages your contacts.

First, let’s install CouchRest:

$ sudo gem install couchrest

Now, let’s open a new file and write our script.

In line 4 and 5 we are just setting up the server(by default, localhost is being used). If the database doesn’t exist, it will get created.


DB = SERVER.database!('contact-manager')

Then, we define your ‘model’, we set the default database to use and define a list of properties. Properties are not required, but they generate getters and setters for you. They are also used to set default values and validate your model. Line 11 shows how to use an alias that will provider a getter and a setter for the property name and the alias name:

property :last_name, :alias => :family_name

Line 14 does something that might seem strange at first. We are casting the address property as an instance of the Address class. Here is what the implementation of the Address class could look like:

Address is just an instance of Hash with some extra methods provided by the CouchRest::CastedModel module. (If you wonder why it’s called CastedModel instead of the more grammatically correct CastModel, the answer is simple: I suck at English grammar :p )

So here is a quick example of how to use a ‘CastedModel’:

That’s part of what’s great with CouchDB, you don’t need to worry too much about storage. Just define your properties, cast to models if needed and save everything as a document.

For more examples checkout the CouchRest spec fixtures and the examples.

To learn more about couchdb, read the (free) online draft of the CouchDB book and of course you probably should read the CouchRest source on GitHub.

, ,