A little while ago we<a
href=”http://www.untyped.com/untyping/archives/2007/06/selenium_code_r.html”>released
an interface to Selenium, a web testing framework.
Since then we’ve learned that Selenium is simply too slow to
use in our work-flow. Hence we started on a faster
reimplementation of the Selenium Remote Control.
A key part of this system is a proxy server, which is
necessary to get around the same origin security restriction in Javascript. I’ve just finished a
large refactoring of the proxy code, and I think the
experience is interesting enough to warrant a blog post.
While there is a large literature on refactoring object-oriented programs, there is rather
less on refactoring functional programs, and what there is tends to concentrate on program
transformation tools (long a FP strength) at the expense
of collecting useful FP refactorings. This post is a small
contribution to redressing the balance.
The code I spent most time on was the HTTP parser. It is
structured as a state machine, so the initial version used
the classic FP pattern ofmutually tail recursive functions. The code for parsing an HTTP request looked something
like this:
(define (parse-request)
(define request-line #f)
(define headers #f)
(define (parse-request-line)
(set! request-line (read-request-line))
(parse-headers))
(define (parse-headers)
(let ([line (read-line)])
(if (end-of-input line)
(begin (set! headers (reverse headers))
(do-something))
(begin (set! headers (cons line headers))
(parse-headers)))))
(parse-request-line))
The code for parsing an HTTP response was very similar:
(define (parse-response)
(define response-line #f)
(define headers #f)
(define (parse-response-line)
(set! response-line (read-response-line))
(parse-headers))
(define (parse-headers)
(let ([line (read-line)])
(if (end-of-input line)
(begin (set! headers (reverse headers))
(do-something-different))
(begin (set! headers (cons line headers))
(parse-headers)))))
(parse-response-line))
The real code was several screens long. I wanted to make
it
simpler by changing to a functional style, and reusing
common code between the request and response parsing
functions. Converting to functional style is simple:
(define (parse-request)
(define (parse-request-line)
(define request-line (read-request-line))
(parse-headers request-line))
(define (parse-headers request-line)
(define headers
(let loop ([line (read-line)])
(if (end-of-input line)
null
(cons line (loop (read-line))))))
(do-something request-line headers))
(parse-request-line))
Reusing common code is not simple. The finite state
machine
pattern doesn’t abstract the next state. For example
parse-headers
in parse-request
always calls do-something
whereas the
otherwise
identical version in parse-response
calls
do-something-different
.
I solved this by refactoring the code into<a
href=”http://library.readscheme.org/page6.html”>continuation-passing
style, leading to code that looks like the following:
;; shared between parse-request and parse-response
(define (parse-headers request-line k)
(define headers
(let loop ([line (read-line)])
(if (end-of-input line)
null
(cons line (loop (read-line))))))
(k request-line headers))
(define (parse-request)
(define (parse-request-line k)
(define request-line (read-request-line))
(k request-line))
(parse-request-line
(cut parse-headers <> do-something)))
Note that I’ve used <a
href=”http://srfi.schemers.org/srfi-26/srfi-26.html”>cut
as a short-cut for lambda
.
I’ve got code reuse but the code itself isn’t nice. The
arguments lists were quite a bit longer in the real code
and
most of the time arguments are just passed from function
to
function without being used (I’ve seen these sort of
arguments called “tramp data”). I also find
that CPSed code can be quite difficult to read — you
have to construct the control flow graph in your head and
then look at the application site to fill in all the
continuations. Ugh.
One way to get rid of tramp data is to use parameters
, and this something we talk about in our experience report. However that solution isn’t appropriate here. It
forces me to stick with CPS so I can set the parameters
in the dynamic extent of the succeeding code, and it
extends the lifetime of the values beyond what is
strictly necessary. This could be an issue if storing,
say, a large request body in a parameter.
Notice that the different versions of
parse-request
only differed in which
function
they called with the value they computed. If I separate out
the computation of that value, and the decision of which
function to call I can get code reuse without CPS, and I
don’t have long argument lists! This is what my final solution
looks like:
;; shared between parse-request and parse-response
(define (parse-headers request-line)
(let loop ([line (read-line)])
(if (end-of-input line)
null
(cons line (loop (read-line))))))
(define (parse-request)
(define (parse-request-line)
(read-request-line))
(do-something (parse-request-line)
(parse-headers)))
(define (parse-response)
(define (parse-response-line)
(read-response-line))
(do-something-different (parse-response-line)
(parse-headers)))
It’s short and sweet, and easy to understand.
So let’s recap what I did:
-
I started with the mutually tail-recursive FSM pattern
(that’s a mouthful!)
- I refactored into continuation-passing style.
-
I separated computation and control, and refactored
back to direct style.
So three refactoring (direct style to CPS, separating
computation and control, and CPS to direct style), two of
which are particular to functional languages, and one
pattern. I could do with a better name than
“separating computation and control”. If you’re
aware of some prior work or can think of a better name do
let me know.
Although they use different terminology, the programming
language theory and software engineering communities have
explored a lot of the same ground from different
perspectives. Program transformations are pretty much the
same thing as refactorings, though the former are often
presented in the context of compiler optimisations.<a
href=”http://www.haskell.org/haskellwiki/Research_papers/Functional_pearls”>Functional
Pearls are very similar to design patterns.
If you’re a student of software engineering in
functional
languages it is necessary to familiarise yourself with this
literature. This can be difficult. There are no books
summarising this literature, as you’ll find for OO
languages, and the papers are often terse and are not
focused on software engineering. This means they can be
difficult to read if you don’t have a background in
programming languages, and you have to read between the
lines a bit.