We’ve been using the Lift web framework for a lot of web development work
recently, and we’re very impressed some of its features.
Lift’s Comet support, in particular, is a blessing for the
kind of data-crunching back-end web sites we typically get
involved in.
Importing data from uploaded files, for example,
frequently causes trouble. An import can take from a few
seconds to a few minutes depending on the size of the file
and the complexity of the data processing and validation
involved. If the import takes more than a few seconds
there is an increasing risk that the web browser will time
out. If this happens we fail, because the user won’t know
whether the import succeeded or not. Lift’s Comet actors
provide a simple way around this problem. But before
describing how they work, let’s quickly go over Comet and actors.
Comet is a way of doing push notifications over HTTP,
which on the face of it appears to only support pull.
Without the jargon, this means a way of allowing the
server to send information to the web browser when that
information is ready, not when the web browser checks for
it. This gives us a better interface, as the UI can
instantly reflect new data, and better resource
consumption, as the client doesn’t have to continuously
poll the server.
There are two or three common ways of implementing Comet. Lift uses a mechanism called “long polling”, which
implements Comet using plain old AJAX. As soon as the web
page loads, the web browser sends an XMLHTTP request to
the server. Instead of replying immediately the server
keeps the connection around until it has information to
push back. When information is available, the web server
responds to the HTTP request, and the browser processes
the response and immediately makes another request. In
other words, long polling uses HTTP’s pull mechanism to
simulate push communication. This is all well and good,
but it immediately raises two issues: how do we manage a
large number of open, but idle, connections without
swamping the server, and what programming model do we use
to manage the additional complexity of Comet applications.
Handing many idle open connections is relatively simple.
The traditional model is to use one thread per request,
but this doesn’t scale when many requests are idle for
long periods. All modern operating systems provide a
scalable event notification system, such as epoll or kqueue, allowing a single thread to simultaneously monitor many
connections for data. The JVM provides access to these
systems via the Selector abstraction in the NIO package. All this is taken care of in the web framework,
so the application programmer does not need to be aware of
it. (Note that other languages present the same facilities
in different ways. Erlang, for example, presents all IO
operations as blocking, but the implementation uses the
same scalable non-blocking OS services as the JVM. Erlang
can do this as it doesn’t use as many resources per thread
as the JVM does. This is an appealing choice as it
provides a uniformity not found on the JVM, but impacts
how Erlang handles multicore.)
More relevant to the application programmer is the
programming model used for Comet, and this is where actors
come in. An actor is basically a thread with the important
restriction that it only communicates with the outside
world via messages. To ask an actor to do something, you
send it a message. This is rather like a method call,
except that the actor queues the message and processes it
asynchronously. When an actor wants to communicate with
another resource, it sends that resource a message. Since
actors never share state with each other, there is never a
need to lock resources to avoid concurrent access. This is
a great model because all the complexities of programming
with locks disappear. If you are interested in more
information on the actor model in Scala try here for the original papers, here for the Akka framework and here for a bit on Lift’s actors.
Actors are a natural fit for Comet. On the server each
Comet connection is handled by a Comet actor, whose job it is to manage communication with a connected
browser. Each actor is bound to a single user’s session,
but actors persist across web requests. We can
asynchronously send an actor messages (whether the user is
looking at the web page or not), and have the actor buffer
them for transmission to the browser. This means we’ve got
almost all of our file upload functionality straight out
of the box, without having to do any particularly tricky
development.
We put a proof-of-concept of the file uploader on Github. The basic structure of the code is:
-
When a file is uploaded it is handed off to a thread for
processing, and a Comet actor is started to communicate
with the client.
-
The processing thread periodically sends messages to the
actor, informing it of progress on the file upload.
-
The Comet actor in turn communicates progress to the
client.
The great thing about this arrangement is that the user
can navigate away from the page without aborting the file
upload, and if they later return to the page they will get
a progress update. It makes for a very pleasant UI.