Posts in the ‘Code’ category

1 Dec 2010

by Dave

File upload using Comet Actors

We’ve been using the Lift web framework for a lot of web development work recently, and we’re very impressed some of its features. Lift’s Comet support, in particular, is a blessing for the kind of data-crunching back-end web sites we typically get involved in.

Importing data from uploaded files, for example, frequently causes trouble. An import can take from a few seconds to a few minutes depending on the size of the file and the complexity of the data processing and validation involved. If the import takes more than a few seconds there is an increasing risk that the web browser will time out. If this happens we fail, because the user won’t know whether the import succeeded or not. Lift’s Comet actors provide a simple way around this problem. But before describing how they work, let’s quickly go over Comet and actors.

Comet is a way of doing push notifications over HTTP, which on the face of it appears to only support pull. Without the jargon, this means a way of allowing the server to send information to the web browser when that information is ready, not when the web browser checks for it. This gives us a better interface, as the UI can instantly reflect new data, and better resource consumption, as the client doesn’t have to continuously poll the server.

There are two or three common ways of implementing Comet. Lift uses a mechanism called “long polling”, which implements Comet using plain old AJAX. As soon as the web page loads, the web browser sends an XMLHTTP request to the server. Instead of replying immediately the server keeps the connection around until it has information to push back. When information is available, the web server responds to the HTTP request, and the browser processes the response and immediately makes another request.  In other words, long polling uses HTTP’s pull mechanism to simulate push communication. This is all well and good, but it immediately raises two issues: how do we manage a large number of open, but idle, connections without swamping the server, and what programming model do we use to manage the additional complexity of Comet applications.

Handing many idle open connections is relatively simple. The traditional model is to use one thread per request, but this doesn’t scale when many requests are idle for long periods. All modern operating systems provide a scalable event notification system, such as epoll or kqueue, allowing a single thread to simultaneously monitor many connections for data. The JVM provides access to these systems via the Selector abstraction in the NIO package. All this is taken care of in the web framework, so the application programmer does not need to be aware of it. (Note that other languages present the same facilities in different ways. Erlang, for example, presents all IO operations as blocking, but the implementation uses the same scalable non-blocking OS services as the JVM. Erlang can do this as it doesn’t use as many resources per thread as the JVM does. This is an appealing choice as it provides a uniformity not found on the JVM, but impacts how Erlang handles multicore.)

More relevant to the application programmer is the programming model used for Comet, and this is where actors come in. An actor is basically a thread with the important restriction that it only communicates with the outside world via messages. To ask an actor to do something, you send it a message. This is rather like a method call, except that the actor queues the message and processes it asynchronously. When an actor wants to communicate with another resource, it sends that resource a message. Since actors never share state with each other, there is never a need to lock resources to avoid concurrent access. This is a great model because all the complexities of programming with locks disappear. If you are interested in more information on the actor model in Scala try here for the original papers, here for the Akka framework and here for a bit on Lift’s actors.

Actors are a natural fit for Comet. On the server each Comet connection is handled by a Comet actor, whose job it is to manage communication with a connected browser. Each actor is bound to a single user’s session, but actors persist across web requests. We can asynchronously send an actor messages (whether the user is looking at the web page or not), and have the actor buffer them for transmission to the browser. This means we’ve got almost all of our file upload functionality straight out of the box, without having to do any particularly tricky development.

We put a proof-of-concept of the file uploader on Github. The basic structure of the code is:

  • When a file is uploaded it is handed off to a thread for processing, and a Comet actor is started to communicate with the client.
  • The processing thread periodically sends messages to the actor, informing it of progress on the file upload.
  • The Comet actor in turn communicates progress to the client.

The great thing about this arrangement is that the user can navigate away from the page without aborting the file upload, and if they later return to the page they will get a progress update. It makes for a very pleasant UI.

Posted in Code, Functional Programming, Scala, Web development | Comments Off on File upload using Comet Actors

26 Nov 2010

by Noel

Interested in a Book on Racket?

If you are interested in a book on Racket (formerly PLT Scheme) I’d appreciate it if you could fill out this survey. Thanks!

Posted in Code, Racket | Comments Off on Interested in a Book on Racket?

19 Nov 2010

by Noel

A Reflection on Working in a Team

Being in the office more frequently has given me more opportunity to work in a team; for better or worse (and mostly worse, in my opinion) the PhD students in my lab tended to work alone. An interesting example of the benefits of team working arose a few days ago. We are in the process of shifting Kahu from its current colo server to a VM. As part of this move we want to benchmark the two servers, to make sure the new VM isn’t going to give us an unexpected performance hit. To do the benchmarking we needed to log more performance statistics, and adding logging to our code was the task Dave and I were working on. I started with a grand plan: we would write three libraries, one for logging, one for bechmarking, and one for logging benchmark data. I remember feeling frustrated as we discussed it as the plan was clear in my mind and I thought I could easily get on and code it alone. However, as we talked more it became apparent the existing tools we had were good enough for most of what we wanted. In the end we wrote a single macro and were done. From three libraries down to four lines of code is a pretty good reduction in effort.

Posted in Code | Comments Off on A Reflection on Working in a Team

18 Jul 2010

by Noel

Open source libraries now on GitHub

We are excited to announce that we’ve moved all of our open source code to Github!

If you want to use our libraries in your application, we recommend getting hold of them via PLaneT as usual. If you want to live on the edge or hack on things, however, you’ll find the libraries here and build instructions here.

We’re still working on the best way of organising and documenting everything. If you have any advice, please get in touch or leave us a note in the comments.

Posted in Code | Comments Off on Open source libraries now on GitHub

10 Jun 2010

by Noel

Selenium client for Racket

Acceptance testing is a must for any developer of complex web applications. Selenium is a suite of tools to help automate acceptance by recording user actions, turning them into code, and playing them back in a remote controlled web browser.

Now, thanks to a lazy Saturday afternoon and a rather nice bottle of ginger beer, the joys of Selenium are available to the Racketcommunity by way of our new Selenium PLT library. Check it out on our Github page and let us know how you get on!

Those of you familiar with our open source libraries may know aboutDelirium, our re-implementation of Selenium using the Racket HTTP Server. Delirium was a great project that made for an elegant demonstration of the power of continuations in web development.

You may fairly ask the question: why, if Delirium is so good, are we releasing bindings for Selenium? There are several answers to this. The main reason is time: maintaining compatibility across all the major browsers is a difficult process, and that’s time we could be devoting to our other pet projects. Second, we’re doing a lot of web development in other languages these days; for example, we’re currently working on a project using Scala and Lift. There are already bindings for Selenium in most of these languages, and it makes sense to use the same tools across the board.

Current Delirium users shouldn’t worry – we will continue to develop all of our libraries to maintain compatibility with new versions of Racket. Our plans internally are to switch to Selenium for new projects, and to keep using Delirium for our existing code. If the time does come to switch away from Delirium, you should find translation to our Selenium bindings to be quite straightforward.

Posted in Code, Racket, Web development | Comments Off on Selenium client for Racket

20 Apr 2010

by Noel

Formalising Bonds with the Informal

There is an interesting move underway by the US Securities and Exchange Commission (SEC) to more precisely define the meaning of certain asset backed securities (like the now infamous mortgage-backed securities that were central to the recent crash). The NY times has covered the story from a high level, but what of particular interest to me is the proposal to specify the meaning of the bonds in Python. This is a step is the right direction but Python is not the answer.

The core problem here is to give a clear and unambiguous meaning to a bond. This requires the language in which it is written is precisely defined. Python is not precisely defined. There is only a prose definition of the language, which is inadequate in the same way that the prose definitions of bonds are inadequate, and of course there are differences between various versions and implementation of Python. Since Python is not precisely defined the only meaning one can give to a program in Python is whatever the particular implementation one uses does with that program.

In contrast there are languages that are formally defined, suchStandard ML and Scheme. These would make a sound basis for the formal definition of bonds. In turns out that functional languages also make a good (meaning expressive and convenient) basis for the formal definition of bonds. There is a great paper on expressing contracts in Haskell and at least one company has implemented this idea in a commercial system (in O’Caml, I believe). So my advice to the SEC: use an appropriate subset of Scheme or Standard ML, or hire someone to create a formally defined DSL, but don’t use a language without a formal definition if precision is your goal.

Posted in Business, Code, General | Comments Off on Formalising Bonds with the Informal

5 Jun 2009

by Noel

Libraries update

We produce a lot of open source code at Untyped, all of which is available from our Subversion repository. You can check out a read-only copy of our repository at any time. If you want to work on a branch without getting commit access to our repository, you might find git-svn or Mercurial’s Subversion integration useful. If you’d like to collaborate on development (and we’re open to all kinds of collaboration, including student/academic work) it is probably simplest to drop us an email (noel or dave at untyped dot com) to arrange things. There is a lot of stuff in there, so here’s a quick summary of the most interesting things we are currently working on.

Snooze is our flagship database abstraction layer, comparable to Hibernate or ActiveRecord. Snooze 2, which is out on PLaneT now, contains a robust query language and support for whole-model validation. Development on the version 3 of the library is underway, with emphasis on caching and inter-struct relationships.

Mirrors is our library for programmatic generation of XHTML and Javascript. It allows you to build blocks of code using a syntax similar to Scheme’s quoted lists. Rendering is done at compile-time as far as possible, so you get the convenience and compositional properties of quoted lists with the speed of PHP-style mechanisms. We intend to add support for CSS in a future release.

Delirium is our web UI testing library, similar to Selenium but with the expressive power of SchemeUnit. Versions 2 and 3, both on PLaneT, have equivalent feature sets: version 2 supports PLT 4.1.3 and earlier, while version 3 supports PLT 4.1.3 and upwards.

Smoke is our UI creation library (the partner library to Mirrors, pun definitely intended). While we have deployed this in a number of production applications, the interface is subject to constant tweaking so we haven’t published it to PLaneT. You can still get hold of the code from SVN and play with it, though!

Dispatch, our controller-to-URL mapping library, has been partly subsumed as its core features have been rolled into PLT’s web-server/dispatch library. There is still room for both systems, though, as the PLT library is built with simplicity in mind, whereas Dispatch was built to simplify web development. We plan on “rebooting” the Dispatch franchise with a version that wraps web-server/dispatch with some new features.

SchemeUnit, Noel’s excellent unit testing package, is as strong as ever. A version recently got rolled into PLT core, but you can still get the PLaneT package for the latest updates from Noel. Note that SchemeUnit’s code is hosted at Schematics.

Other points of interest:

  • Unlib is still going strong, with new shorthand require/providesyntaxes that can fetch stuff directly from SVN, a more humane version of keyword-apply, and some utilities to support dotted identifiers in our other libraries.
  • Excel, as its name suggests, is a library for creating Excel files in functional drawing style. It supports all the basics: formulae, inter-cell references, number formats, fonts, borders and fills, conditional formatting, and cell validation.
  • Autoplanet is a tool for deploying applications without having to worry about changing dependencies. It creates an application-local PLaneT cache and can be configured to download and install packages from PLaneT, SVN, or the local filesystem.

Posted in Code | Comments Off on Libraries update

27 Jan 2009

by Noel

Untyped open source repository: open for business

It’s been a long time coming, but we are proud to announce the launch of the Untyped open source repository!

This public Subversion repository houses the source code for our open source projects, including popular PLT Scheme packages such as Dispatch, Snooze and Mirrors as well as a few things you won’t find on PLaneT.  It’s all free and open source, but please make sure you agree to our terms of use before you get stuck in.

You can check out any or all the projects using command line SVN. For example:

  svn co http://svn.untyped.com/mirrors/trunk mirrors-trunk
It’s no-frills at the moment – SVN and web browser support only – but we plan to prettify things in the future. See the README file for more information.

Posted in Code | Comments Off on Untyped open source repository: open for business

5 Nov 2008

by Noel

Tests as todos

Like most people I have a few projects on the go at once. To efficiently switch between them I must be able to quickly pick up where I left off. In my programming projects I’ve been using failing tests as reminders to myself. This fits in nicely with my programming workflow, and enables me to make progress before I’ve recalled all the details of the project I’m working on. Here’s how it works:

In my programming workflow I cycle between writing tests, writing code, and running tests (this is just test driven development). When I’m about to stop working on a project I write some failing tests, which act as a specification for what I should do next. At this point in time I’ve been working on the project for a while so I have recalled its structure and I’m in a good position to make this decision.

When I pick up a project after a break I enter straight into my normal workflow and run my tests. I inspect the failing tests and start implementing the functionality they specify. At this point in time I don’t even have to remember why I’m implementing this; the tests provide enough detail that I can just start coding. As I do so I invariably recall more details of the project. By the time I’ve finished the feature I’m ready to go at full speed.

This technique allows me to “hide” the time it takes to recall the project details; I still get useful work done in this period. It’s quite a simple idea and no doubt some of you are already using it, but if you haven’t tried it, give it a shot.

Posted in Code | Comments Off on Tests as todos

27 Sep 2008

by Noel

Commercial use articles in the Journal of Functional Programming

Further evidence of the increasing commercial relevance of functional programming (and just as important, the desire of the academic community for said relevance) is the announcement that the Journal of Functional Programming is now accepting Commercial Use articles.

The software engineering (SE) community and the programming language theory (PLT) community have traditionally been quite separate. This has led to much duplication of work. For example, it is well known that patterns are language features, indicating that the two communities have essentially been solving the same problems from different angles. The SE field assumes the language is fixed, and so develops abstractions (i.e. patterns) that exist above the level of code. The PLT researcher changes the language to directly express the abstraction. The SE approach is pragmatic but inelegant. The promise of the PLT approach has been retaining elegance (with associated reduction in development and maintenance cost), but the cost of adopting a new language is often seen as too high. (Incidentally, solving this problem is why you want a language with macros. I.e. Scheme.)

That the JFP is looking to publish essentially SE articles is definitely a good thing. It will, however, be interesting to see how the community adapts to members with quite different aims and values to the typical PLT researcher. For example, the PLT researcher is very focused on formality, and in particular giving precise semantics to language constructs. This leads to a style that deals with concepts in a very abstract mathematical manner. While powerful this is certainly not the best presentation for a working programmer to learn from. Consider, for example, Comprehending monads, which aims to be an easy introduction to monads. Given that this paper was published in 1992, we can see its ineffectiveness as a tutorial by considering the zillions of monad tutorials that have been written since. The problem is the abstract style of presentation. While presenting in an abstract style allows you to generalise to many different situations, people work best going from the specific to the general. Better tutorials start with a concrete example, and abstract from there. The question for the PLT community is how they will accommodate the desire for less formality and abstraction from the working programmer when one of the key differentiating factors between SE and PLT is the use of formalisms. If the right balance can be found this will be a great thing for both researchers and practitioners.

Posted in Code | Comments Off on Commercial use articles in the Journal of Functional Programming