Business - Untyping

30 Aug 2011

by Noel

What is Hacker News Worth?

Twelve thousand hits, some thirty emails, and over a dozen new beta testers. That’s what happened when a blog post of ours spent ten hours on the Hacker News frontpage. It was definitely fun getting all that attention, despite the rush of traffic taking our little server off the web for a while. (Installing WP-Cache brought it back.)

Myna is the system described in the blog post, and we’re accepting beta users right now. If you’re interested in content optimisation on your website, and want better results than A/B testing will deliver, do take a look. Obviously getting this surge of traffic from HN is incredibly valuable to us. However I don’t have any suggestions for repeating the event: when I submitted the blog post to HN some months ago it disappeared without a trace. Certainly being active answering questions on HN helped keep it on the front page, and that position netted us a fairly steady thousand hits an hour.

If you’re one of the people who read our blog post, thanks for the interest! It’s very exciting for us to know that our idea for improving content optimisation resonates with so many people, and we’re looking forward to getting Myna out of beta and seeing where it takes us.

Posted in Business, Front page, Myna | Comments Off on What is Hacker News Worth?

23 May 2011

by Noel

The Future of VoIP Phone Configuration Interfaces

We’ve recently completed a very fun and interesting job working on a new interface for managing VoIP phone systems. We have a VoIP phone, provided by Loho, who were also our client for this project. It’s great — we can forward calls to our mobiles, cart the phone around with us (plug it into a network connections and it just works), and it even emails us our voice messages. The only thing not great about our phone is the configuration interface. Luckily, that’s what this project set out to solve.

The brief was to implement an elegant online phone configuration system. Alex, Director at Loho, provided the vision. We provided two weeks of development time, which was enough to create a working prototype. Alex has asked us to not give away too many details about the system, but I can show you a few screenshots. First up, here’s the main screen:

The very stylish main menu of the VoIP administration tool we’ve built for Loho.

Doesn’t give away much, does it? A bit more interesting is a detail of editing a configuration:

Also very stylish: editing the configuration of a voice menu

Here I’m editing a voice menu — one of those “Press 1 if you’re interested in giving us all your money” type things.

We think we’ve created a very nice system. Loho tell us they were overwhelmed with interest at a recent tradefair, suggesting we’re not alone in our opinion. While the interface is an important aspect of the work, the backend (which I can talk about!) is just as important. The main task was defining a data model to capture the rich feature set that Loho provide. This turned out to be very similar to designing a programming language and its intermediate representation. For example, we use a continuation-passing style representation to avoid maintaining a stack on the server side. Our representation distinguishes between tail calls and normal function calls to avoid excessive resource consumption on the VoIP side. Relational databases don’t do a very good job of storing recursive datastructures, like the AST of a programming language, so we used Mongo for the data store. In addition to its flexible data model, Mongo is web scale which has given us an immediate status boost at local programmer meetups.

The backend code is implemented in Scala and Lift. There are actually two interfaces to the service. One is the nice interface the users see, and the other is a REST interface that is called by the Asterisk AGI scripts that implement the VoIP functionality. The Asterisk system doesn’t handle all the functionality we represent internally, so the REST interface includes a small interpreter that executes intermediate steps till we arrive at something Asterisk deals with.

Posted in Business, Code, Design, Functional Programming, Scala | Comments Off on The Future of VoIP Phone Configuration Interfaces

11 Feb 2011

by Noel

Stop A/B Testing and Make Out Like a Bandit

This is the blog post that led to Myna. Sign up now and help us beta test the world’s fastest A/B testing product!

Were I a betting man, I would wager this: the supermarket nearest to you is laid out with fresh fruit and vegetables near the entrance, and dairy and bread towards the back of the shop. I’m quite certain I’d win this bet enough times to make it worthwhile. This layout is, of course, no accident. By placing essentials in the corners, the store forces shoppers to traverse the entire floor to get their weekly shop. This increases the chance of an impulse purchase and hence the store’s revenue.

I don’t know who developed this layout, but at some point someone must have tested it and it obviously worked. The same idea applies online, where it is incredibly easy to change the “layout” of a store. Where the supermarket might shuffle around displays or change the lighting, the online retailer might change the navigational structure or wording of their landing page. I call this process content optimisation.

Any prospective change should be tested to ensure it has a positive effect on revenue (or some other measure, such as clickthroughs). The industry standard method for doing this is A/B testing. However, it is well known in the academic community that A/B testing is significantly suboptimal. In this post I’m going to explain why, and how you can do better.

There are several problems with A/B testing:

A/B testing is suboptimal. It simply doesn’t increase revenue as much as better methods.
A/B testing is inflexible. You can’t, for example, add a new choice to an already running test.
A/B testing has a tedious workflow. To do it correctly, you have to make lots of seemingly arbitrary choices (p-value, experiment size) to run an experiment.

The methods I’m going to describe, which are known as bandit algorithms, solve all these problems. But first, let’s look at the problems of A/B testing in more detail.

Suboptimal Performance

Explaining the suboptimal performance of A/B testing is tricky without getting into a bit of statistics. Instead of doing that, I’m going to describe the essence of the problem in a (hopefully) intuitive way. Let’s start by outlining the basic A/B testing scenario, so there is no confusion. In the simplest situation are two choices, A and B, under test. Normally one of them is already running on the site (let’s call that one A), and the other (B) is what we’re considering replacing A with. We run an experiment and then look for a significant difference, where I mean significance in the statistical sense. If B is significantly better we replace A with B, otherwise we keep A on the site.

The key problem with A/B testing is it doesn’t respect what the significance test is actually saying. When a test shows B is significantly better than A, it is right to throw out A. However, when there is no significant difference the test is not saying that B is no better than A, but rather that the data does not support any conclusion. A might be better than B, B might be better than A, or they might be the same. We just can’t tell with the data that is available*. It might seem we could just run the test until a significant result appears, but that runs into the problem of repeated significance testing errors. Oh dear! Whatever we do, if we stick exclusively with A/B testing we’re going to make mistakes, and probably more than we realise.

A/B testing is also suboptimal in another way — it doesn’t take advantage of information gained during the trial. Every time you display a choice you get information, such as a click, a purchase, or an indifferent user leaving your site. This information is really valuable, and you could make use of it in your test, but A/B testing simply discards it. There are good statistical reasons to not use information gained during a trial within the A/B testing framework, but if we step outside that framework we can.

* Technically, the reason for this is that the probability of a type II error increases as the probability of a type I error decreases. We control the probability of a type I error with the p-value, and this is typically set low. So if we drop option B when the test is not significant we have a high probability of making a type II error.

Inflexible

The A/B testing setup is very rigid. You can’t add new choices to the test, so you can’t, say, test the best news item to display on the front page of a site. You can’t dynamically adjust what you display based on information you have about the user — say, what they purchased last time they visited. You also can’t easily test more than two choices.

Workflow

To setup an A/B experiment you need to choose the significance level and the number of trials. These choices are often arbitrary, but they can have a major impact on results. You then need to monitor the experiment and, when it concludes, implement the results. There are a lot of manual steps in this workflow.

Make out like a Bandit

Algorithms for solving the so-called bandit problem address all the problems with A/B testing. To summarise, they give optimal results (to within constant factors), they are very flexible, and they have a fire-and-forget workflow.

So, what is the bandit problem? You have a set of choices you can make. On the web these could be different images to display, or different wordings for a button, and so on. Each time you make a choice you get a reward. For example, you might get a reward of 1 if a button is clicked, and reward of 0 otherwise. Your goal is to maximise your total reward over time. This clearly fits the content optimisation problem.

The bandit problem has been studied for over 50 years, but only in the last ten years have practical algorithms been developed. We studied one such paper in UU. The particular details of the algorithm we studied are not important (if you are interested, read the paper – it’s very simple); here I want to focus on the general principles of bandit algorithms.

The first point is that the bandit problem explicitly includes the idea that we make use of information as it arrives. This leads to what is called the exploration-exploitation dilemma: do we try many different choices to gain a better estimate of their reward (exploration) or try the choices that have worked well in the past (exploitation)?

The performance of an algorithm is typically measured by its regret, which is the average difference between its actual performance and the best possible performance. It has been shown that the best possible regret increases logarithmically with the number of choices made, and modern bandit algorithms are optimal (see the UU paper, for instance).

Bandit algorithms are very flexible. They can deal with as many choices as necessary. Variants of the basic algorithms can handle addition and removal of choices, selection of the best k choices, and exploitation of information known about the visitor.

Bandits are also simple to use. Many of the algorithms have no parameters to set, and unlike A/B testing there is no need to monitor them — they will continue working indefinitely.

Finally, we know bandits work on the web, as much of the current research on them is coming out of Google, Microsoft, Yahoo!, and other big Internet companies.

So there you have it. Stop wasting time on A/B testing and make out like a bandit!

Join Our Merry Band

Finally, you probably won’t be surprised to hear we are developing a content optimisation system based on bandit algorithms. I am giving a talk on this at the Multipack Show and Tell in Birmingham this Saturday.

We are currently building a prototype, and are looking for people to help us evaluate it. If you want more information, or would like to get involved, get in touch and we’ll let you know when we’re ready to go.

Update: In case you missed it at the top, Myna is our content optimisation system based on bandit algorithms and we’re accepting beta users right now!

Posted in Business, Code, Design, Front page, General, Myna, Web development | Comments Off on Stop A/B Testing and Make Out Like a Bandit

10 Jan 2011

by Noel

The University of Untyped

We’ve recently started a reading group at Untyped. As consultants we need to maintain our expertise, so every Friday we tackle something new for a few hours. Given our love of Universities (we average three degrees per Untypist) and our even greater love of grandiose corporate training (hello, Hamburger University!) we have named this program Untyped University.

Broadly, we’re covering the business of the web and the business of building the web. The online business is, from certain angles, quite simple. The vast majority of businesses can be viewed as a big pipeline, sucking in visitors from the Internet-at-large, presenting some message to the user, and then hoping they click “Buy”. At each stage of the pipeline people drop out. They drop out right at the beginning if the site isn’t ranked high enough on search terms or has poorly targetted ads. They abandon the website if the design is wrong, or the site is slow, or the offer isn’t targeted correctly. Each step of this pipeline has tools and techniques that can be used to retain users, which we’ll be covering. The flipside of this is the pipeline that delivers the site, starting with data stores, going through application servers, and finishing at the browser or other client interface. Here we’ll be looking at the technologies and patterns for building great sites.

So far we’ve run a couple of sessions. The first covered bandit algorithms, and the second Amazon’s Dynamo. We’ll blog about these soon. We’ve started a Mendeley group to store our reading (though not everything we cover in future will be in published form.) Do join in if it takes your fancy!

Posted in Business, Front page, General, Web development | Comments Off on The University of Untyped

22 Jul 2010

by Noel

Epistemology and A/B Testing

A/B testing is all the rage in certain web development circles. Naturally, when something becomes popular the criticism starts. I’ve read some unconvincing attacks on A/B testing recently, as well as some good ones, so I want to lay down my thoughts on what A/B testing is and what it isn’t.

The general method of A/B testing on the web is as follows:

Decide on a change to make to the site. This could be as small as the wording of a title or as large as the entire navigational structure of the site.
Decide what outcome you want to measure. Typical examples are purchases, time spent on the site, and number of repeat visits.
Randomly assign each visitor one of the two (or more) versions of the site.
Measure how the different versions stack up against the outcome of interest.

This is a fairly simple thing. Critics of A/B testing usually claim that it is only good for small changes. It cannot, they claim, be used for business-changing disruptive innovation. The critics are wrong. They are confusing the principles underlying A/B testing with the commons implementations of the idea.

How We Acquire Knowledge

There are basically three means by which we come to acquire knowledge:

By appealing to authority.
By constructing statements consistent with assumed first principles.
By making observations on the effects of actions.

The third method has proven to be vastly superior when studying the natural world, and is the basis of the method known as science. If you are reading this then you are validating the efficacy of this method, as the computer you are using is the result of a few hundred years of scientific developments.

The primary mechanism of science is the experiment. An experiment involves performing some action in the world and measuring it’s effect. If different actions leads to different outcomes one typically does some statistical analysis on the result, to determine if one is justified in believing the differences represent a true difference or are just the result of chance.

A/B Is Science

A/B testing is science. A/B testing is about taking an action and measuring its effects. That is, doing an experiment. One can experiment with small things, like the colour of a button on a web site. One can also experiment with large things, like business models, new technology, and other disruptive changes.

The critics see the small experiments used to market A/B testing to internet businesses and think it is the totality of the method. They are right that companies usually don’t A/B test large changes. It is unusual to run two or more different business models, for example. That doesn’t mean these experiments aren’t done, but they are typically done at the level of the market rather than the individual company. Different companies, called competitors, experiment with a particular combination of strategy, model, and implementation, and the market measures their effect. Sometimes big companies will run these experiments internally. Google, for example, is currently experimenting with both Android and Chrome OS in more or less the same space. Complex experiments like this aren’t controllable nor are they repeatable, so the methods of social science are preferred over those of the hard sciences, but they still fall within the scientific paradigm.

A/B Testing Isn’t All That

I’ve said A/B testing is science, and science is great. However I do think the current implementation of A/B testing, as used by web companies, is flawed. The reason is we’re usually interested in decision making not hypothesis testing, and with decision making we want a different setup than is currently used. Exploring this is for another post.

Posted in Business, Front page | Comments Off on Epistemology and A/B Testing

20 Jul 2010

by Noel

Birmingham Events

Birmingham doesn’t have great visibility at the intersection of software development, design, and entrepreneurship in which Untyped operates, but in the last few months there has been a surge of events that suggest this is changing.

Here’s a list of regular events, some new and some established, that we’ve found of interest:

fizzPOP is a hackerspace that runs fortnightly meetings. The next meeting, which will be my first, is tomorrow!
Tech Wednesday is a networking group for computing professionals. It has only been running for a month. The next meeting also tomorrow, and it will also be my first.
The first meeting of StartupMill Birmingham was yesterday. I could only make the beginning of the meeting, but Dave tells me there was an interesting group of people.
Likemind is a nice relaxed networking event held in the Jewellry Quarter. There is a good mix of people there representing all sorts of nearby businesses.
Digital Playground attracts a tonne of web and graphic design people from Fazeley Studios and surrounds.

Posted in Business | Comments Off on Birmingham Events

20 Apr 2010

by Noel

Formalising Bonds with the Informal

There is an interesting move underway by the US Securities and Exchange Commission (SEC) to more precisely define the meaning of certain asset backed securities (like the now infamous mortgage-backed securities that were central to the recent crash). The NY times has covered the story from a high level, but what of particular interest to me is the proposal to specify the meaning of the bonds in Python. This is a step is the right direction but Python is not the answer.

The core problem here is to give a clear and unambiguous meaning to a bond. This requires the language in which it is written is precisely defined. Python is not precisely defined. There is only a prose definition of the language, which is inadequate in the same way that the prose definitions of bonds are inadequate, and of course there are differences between various versions and implementation of Python. Since Python is not precisely defined the only meaning one can give to a program in Python is whatever the particular implementation one uses does with that program.

In contrast there are languages that are formally defined, suchStandard ML and Scheme. These would make a sound basis for the formal definition of bonds. In turns out that functional languages also make a good (meaning expressive and convenient) basis for the formal definition of bonds. There is a great paper on expressing contracts in Haskell and at least one company has implemented this idea in a commercial system (in O’Caml, I believe). So my advice to the SEC: use an appropriate subset of Scheme or Standard ML, or hire someone to create a formally defined DSL, but don’t use a language without a formal definition if precision is your goal.

Posted in Business, Code, General | Comments Off on Formalising Bonds with the Informal

12 Mar 2010

by Noel

Hacking Motivation

Like most people there are parts of my job I don’t enjoy. Writing quotes, for example, is not my favourite activity. Recently I’ve been thinking about a way to “hack” my motivation, to make these parts of my job more interesting and enjoyable. Here are my ideas.

Self-determination theory posits that motivation derives from autonomy, relatedness, and competency. The first two are easy to come by in a small business. I’m my own boss and what I do is critical to the success of the business, and hence to my continued mortgage payments. So it seems the later is the limiting factor. This matches my experience; I can happily program for hours (days? years?) and I consider my pretty damn good at it. Writing quotes is painful and it takes me a long time to finish one. So perhaps if I can gain competency I’ll enjoy writing quotes more and thus become more motivated to complete them.

How does one gain competency? Deliberate practice is how. It may take 10000 hours to become an expert but I don’t need to be an expert quote writer, just a better one than I am today. Luckily thelearning curve suggests that I can get to a good level of competency with relatively little effort. So this is my hack for improving motivation: pick something I do often that I’m bad at and practice it in a fun way. For quote writing that might mean trying to write a good quote for a preposterous imagined product. If you enjoy the ridiculous like I do you’d probably find that fun.

I’m not actually going to practice quote writing right now as I have other big projects occupying me, but I intend to try this technique out in the future. I’ll let you all know how this turns out for me. Finally, if you try my technique I’d love to hear how it works out for you.

Posted in Business | Comments Off on Hacking Motivation

24 Oct 2008

by Noel

The 4ip Fund

If you live in the West Midlands (I think we have a few readers here…) you should take a look at the 4iP fund. I hope to write more about this later.

Posted in Business | Comments Off on The 4ip Fund

6 Jun 2008

by Noel

Musing on Ravelry

Hi Ravelry people! Thanks for dropping by and commenting. It’s clear you are all very passionate, which bodes well for the long-term future of Ravelry.

A number of you have argued that Ravelry is more than just a social network. I agree, and I think this an important development in the business model for these kinds of sites. Flickr was the first “social networking” site that I saw that offered useful features beyond the social network. This is a great model. Few people want to dive into a new community without first spending some time learning the rules of the group. What Flickr does is give you a reason to return to the site before getting involved in the social aspects. So you start of using Flickr just to store your photos and then perhaps over time find your way onto the various groups. In my case I never make use of the social features of Flickr, but Flickr still benefits from my custom. In comparison sites like Friendster give you nothing to do on the site beyond the social interaction, and consequentially I never visit the site.

Where I see Ravelry going beyond Flickr is in acting as an intermediary connecting buyers and sellers in the knitting and crochet community. While Flickr offers some commercial services, it is a very asymmetric model with only a few big sellers. It seems that Ravelry is pursuing a much more egalitarian model, where any community member can easily engage in either end of the transaction. Ravelry is essentially the market maker, and you just have to look at the London Stock Exchange or NASDAQ to see how important this function is. What makes the Internet wonderful is that it allows someone to make a market (and a living) in something as informal and fun as knitting!

Posted in Business | Comments Off on Musing on Ravelry

Posts in the ‘Business’ category

What is Hacker News Worth?

The Future of VoIP Phone Configuration Interfaces

Stop A/B Testing and Make Out Like a Bandit

Suboptimal Performance

Inflexible

Workflow

Make out like a Bandit

Join Our Merry Band

The University of Untyped

Epistemology and A/B Testing

How We Acquire Knowledge

A/B Is Science

A/B Testing Isn’t All That

Birmingham Events

Formalising Bonds with the Informal

Hacking Motivation

The 4ip Fund

Musing on Ravelry

Recent Posts

Recent Comments

Archives

Categories