Friday, November 13, 2009

An Introduction to CouchDB

CouchDB is one of the most popular and mature document-oriented databases. Let’s have a look at the features that make it so popular, get it installed, and start putting it to use.

A couple weeks ago I wrote about NoSQL and provided a short overview of the landscape of non-relational databases.

One that has become increasingly popular is Apache CouchDB, so I’d like to spend a couple weeks digging into it a bit and talking about why it’s so interesting.

Before I do, it’s worth noting that Ubuntu 9.10 was just released and uses CouchDB under the hood. The Ubuntu One backup/synchronization service makes it easy to back up and sync Firefox bookmarks, Tomboy notes, files, contacts, and more.

As more users adopt 9.10 and Ubuntu One, CouchDB usage grows accordingly. If you read my previous NoSQL article and wondered which projects are ready for prime time, consider this a big vote of confidence for CouchDB.

Relax
You’ll often see the word “relax” associated with CouchDB. That’s because CouchDB tries to solve a lot of the “hard problems” associated with building a scalable distributed document-oriented database.

It does a lot of heavy lifting for you so that you can focus on building your application without worrying too much about administration or weird corner cases.

CouchDB also sports a very simple and easy to understand RESTful API. This should make for a very low barrier to entry and stress-free development.

As we progress through the process of using CouchDB, I think you’ll start to realize that this motto is not just “marketing speak.”

So let’s have a look at its main features

Features
While we could dig deep into specific areas of CouchDB, it’s worth looking at the high-level features first. As you read this, you might start to think of it as a hybrid of a document database and a more traditional transaction-oriented relational database. CouchDB has an impressive feature set.

Document Storage As previously discussed, CouchDB stores documents in their entirety. You can think of a document as one or more field/value pairs expressed as JSON.

Field values can be simple things like strings, numbers, or dates. But you can also use ordered lists (arrays) and associative maps (associative array, hash, whatever your language may call them).

Every document in a CouchDB database has a unique id and there is no required document schema.


ACID Semantics Like many relational database engines, CouchDB provides ACID semantics. It does this by implementing a form of Multi-Version Concurrency Control (MVCC) not unlike InnoDB or Oracle.

That means CouchDB can handle a high volume of concurrent readers and writers without conflict.


Map/Reduce Views and Indexes To provide some structure to the data stored in CouchDB, you can develop views that are similar to their relational database counterparts.

In CouchDB, each view is constructed by a JavaScript function (yes, server-side JavaScript) that acts as the Map half of a MapReduce operation.

The function takes a document and transforms it into a single value which it returns. The logic in your JavaScript functions can be arbitrarily complex.

Since computing a view over a large database can be an expensive operation, CouchDB can index views and keep those indexes updated as documents are added, removed, or updated.

This provides a very powerful indexing mechanism that you get unprecedented control over compared to most databases.


Distributed Architecture with Replication CouchDB was designed with bi-direction replication (or synchronization) and off-line operation in mind.

That means multiple replicas can have their own copies of the same data, modify it, and then sync those changes at a later time. The biggest gotcha typically associated with this level of flexibility is conflicts.


Erlang Like many of this generation’s distributed databases, CouchDB is written in Erlang and uses the Erlang OTP platform.

Erlang is a language designed for use in fast and highly reliable telecommunications systems, but a whole new crop of developers has discovered it as a great tool for building reliable network services that scale well on multi-core systems.

Erlang emphasizes immutable data and lightweight processes that use message passing for communication. The OTP platform provides facilities for making distributed, real-tme, and highly-available systems.

As our expectations and demands for network services have risen to the point where failures are increasingly difficult to tolerate, Erlang/OTP starts to look more and more like the Right Way to build systems.

Couch Setup
Clearly CouchDB has an impressive feature set. Let’s look at getting it up and running and then write a bit of code to talk to it.

CouchDB has been around long enough that many Linux distributions have pre-built packages available. Installing in Ubuntu 9.04 is easy:
jzawodn@t61:~$ sudo apt-get install couchdb
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  erlang-base erlang-nox libmozjs0d
Suggested packages:
  erlang-x11 erlang erlang-manpages erlang-doc-html
The following NEW packages will be installed:
  couchdb erlang-base erlang-nox libmozjs0d
0 upgraded, 4 newly installed, 0 to remove and 31 not upgraded.
Need to get 28.4MB of archives.
After this operation, 46.7MB of additional disk space will be used.
...

After several minutes to download the package, you’ve got a running instance of CouchDB. Easy.

If your distribution doesn’t have CouchDB or the version is fairly old, you can get the latest release from http://couchdb.apache.org/downloads.html and the Wiki contains installation instructions for most platforms. Having the latest version isn’t critical when you’re just getting your feet wet.

Once CouchDB is up and running, check that it responds using curl:
$ curl http://localhost:5984/
{"couchdb":"Welcome","version":"0.8.0-incubating"}

If that works, you’re good to go.

Futon Web Interface
With CouchDB up and running, you can point your web browser at the biult-in web interface by visiting http://localhost:5984/_utils.

This interface is known as “futon” and provides a handy want to check on your data and perform some basic administrative operations: create databases, remove databases, manage documents (records), and so on.

On a fresh install, it produces a fairly sparse interface with a Create Database button. You can select then and provide a name such as test01 and CouchDB will create an empty database for you.

You cannot add new records from within the Futon interface, so all you can do is select test01 and see what an empty database looks like.

Or you could delete it. But as we’ll see in the next installment, adding documents is fairly straightforward.

Coming Up
Hopefully this has given you a small flavor what CouchDB is about–at least a theoretical level. Next week we’ll take a look at writing some Perl code to load documents into CouchDB as well as writing basic views to extract some useful information from the loaded data.

If you’ve already experimented with CouchDB or are using it in production, I’d love to hear about it. Leave a note in the comments below.

No comments:

Post a Comment