The second paper we looked at in UU is Amazon’s 2007 paper onDynamo. Dynamo is an example of a new type of database dubbed
NoSQL and Riak is an
open-source implementation of the Dynamo architecture.
Studying Dynamo is worthwhile for a number of reasons:
-
It combines a lot of recent ideas in distributed
systems. These ideas are worth learning in their own
right to avoid mistakes likeReddit’s when building scalable systems.
-
Since Riak is basically Dynamo, knowledge of Dynamo is
directly applicable.
-
Understanding the design trade-offs in Dynamo provides a
way to understand the rest of the NoSQL space.
So, What is NoSQL?
In the old days everyone used relational databases and it
was good. Then along came the web, and with the web a
tidal wave of data, and things were not good. The
tradeoffs made by relational databases (maintaining the
famous ACID properties) made them unsuitable for tasks where
response time and availability were paramount. This is the
case for many web applications. For example, it doesn’t
really matter if my Facebook status updates aren’t
immediately visible to all my friends, but it does matter
if my browser hangs for a minute while the back-end tries
to get a write lock on the status table.
NoSQL databases make a different set of tradeoffs, and
achieve different performance characteristics as a result.
Typically, NoSQL databases focus on scalability, fast
response times, and availability, and give up atomicity
and consistency. This tradeoff is formalised via the CAP Theorem, which states that a distributed system cannot
provide consistency, availability, and partition tolerance
all at the same time (although two out of three of these
properties are achievable at once). Dynamo provides
availability and partition tolerance at the expense of
consistency. Other NoSQL databases may make different
tradeoffs. SQL databases typically provide consistency and
availability at the expense of partition tolerance.
Reading the Paper
The Dynamo paper can be difficult to read. The main issue
we had is that the authors don’t always motivate the
different components of the system. For example,
consistent hashing is one of the earlier concepts
introduced in the paper, but it is difficult to see why it
is used and how it contributes to increased availability
until later on. It is best to approach each section of the
article as a self-contained idea, and wait until the end
to see how they are combined. It took us two sessions to
get through the paper, so don’t be surprised if you find
it slow going.
Setting Out the Shop
The paper starts by laying out the properties required of
Dynamo. We’ve talked about the tradeoff between
consistency, availability, and partition tolerance above.
Some of the other properties are:
-
Cost-effectiveness. This is important but often
overlooked. You’ll sometimes see supporters of
relational databases arguing that if people got some real database hardware they’d never need NoSQL. The problem with real
hardware is it’s expensive. If my 20-CPU database server
is at full capacity I have to drop another $20’000 just
to handle another 5% increase in traffic. I probably
can’t get next day delivery on this type of server,
either. With a system like Dynamo I can just boot up
another $500/yr virtual machine.
-
Dynamo is a key-value store. This means that there are
no foreign keys and hence no joins: the application must
provide all of this, or more likely use a denormalised
data representation. Furthermore, Dynamo sees its data
as opaque binary blobs, so search is only possible using
primary keys. Other NoSQL databases make different
choices: MongoDB and CouchDB are
document-oriented stores, meaning that data is stored as
a JSON-like tree of keys and values; HBase and Cassandra store data as tuples, like a relational database, but
without foreign keys.
-
Low configuration, and fully distributed design. These
two go hand-in-hand. A fully distributed design means
all nodes are the same, and thus have the same
configuration. It also means there is no single point of
failure, another desirable feature. Again, different
systems take different approaches. For example,MongoDB and most relational databases have a master/slave
setup in which one machine has special “master”
significance. Obviously in this setup different machines
have different configurations.
Big Ideas
Dynamo is the fusion of a lot of ideas that are have
developed in the field of distributed systems. Rather than
duplicate the paper I want to discuss four points that I
found interesting:
- Consistent hashing
- Dynamo’s implementation
- Amazon’s quality metric
- Feedback control for balancing tasks
Consistent Hashing
If you take one point from Dynamo, let it be the
usefulness ofconsistent hashing. The basic idea of consistent hashing is to decouple the
value of a key from the machine it is stored on. If you do
this you can add and remove machines from your data store
without breaking anything. If you don’t, you’re in a world of pain.
Consistent hashing is best explained via an example of
doing it wrong. Say you have N
machines
serving as your data store. Given a key you want to work
out which machine stores the data. A simple way to do so
(which is what Reddit did) is to calculate key mod N
. Now suppose due to increased load you want to add a
machine in your data store. Now key mod (N+1)
won’t give the same result, so you can’t find your data
any more. To fix this you have to flush out the data and
reinsert it, which will take a long time. Or you can use
consistent hashing from the outset.
In consistent hashing you arrange the space of hash keys
into a ring. Each server inserts a token into the ring,
and is responsible for keys that lie in the range from
it’s token to the nearest preceding token. This is
illustrated in the image to the left. The small circles
indicate the tokens, and the colours the segments of the
hash ring allocated to each server.
Adding a new server only requires coordination with the
server that previously occupied that part of the hash
space. In the original consistent hashing paper tokens
were inserted at random. For Dynamo it was found that a
more structured system worked better. I’ll leave the
details of this and other issues (in particular, routing
and replication) to the paper.
Non-blocking IO
The section on Dynamo’s implementation will be interesting
to PL geeks. If you’ve ever rolled your eyes at the
manual continuation-passing style inflicted by Javascript then you might at least crack a
wry smile when you read about essentially the same
technique being used in Dynamo. There is an interesting
debate to be had on the virtues of non-blocking IO vs
thread-per-connection. At the moment my opinion is
non-blocking IO is a necessary evil given kernels written
in unsafe languages (and hence expensive context
switches). Erlang does a good job of presenting a simple
programming model with its light-weight threads, but
achieving decent SMP performance can be hard due to the
mismatch between application and OS threads. It’s my hope
that languages like Rust will give a pragmatic solution to this dilemma.
Amazon’s Quality Metric
Although it isn’t part of the main thrust of the paper, I
found it interesting that Amazon measure response time and
other variables at the 99.9% percentile. Amazon have a
very good reputation, and for other companies looking to
achieve the same stature it is good to know the goal to
aim for.
Feedback Control for Balancing Tasks
I’ve recently implemented feedback control (in particular,
proportional error control) for a database connection
pool. (I’ll blog about this in a bit.) It’s interesting
that Dynamo uses a similar method to balance tasks within
each node (Section 6.5). I think we’re going to see more
self-regulating systems in the future. The work atRADLab is a good example of what might make it into production
in a few years.
By scheduling tasks itself Dynamo is performing a task
typically handled by the operating system. I think in the
future this will be more commonplace, with the distinction
between operating system and application program becoming
increasingly blurred. TheManaged Runtime Initiative is one project that aims to do this.