A/B testing is all the rage in certain web development circles. Naturally, when something becomes popular the criticism starts. I’ve read some unconvincing attacks on A/B testing recently, as well as some good ones, so I want to lay down my thoughts on what A/B testing is and what it isn’t.
The general method of A/B testing on the web is as follows:
- Decide on a change to make to the site. This could be as small as the wording of a title or as large as the entire navigational structure of the site.
- Decide what outcome you want to measure. Typical examples are purchases, time spent on the site, and number of repeat visits.
- Randomly assign each visitor one of the two (or more) versions of the site.
- Measure how the different versions stack up against the outcome of interest.
This is a fairly simple thing. Critics of A/B testing usually claim that it is only good for small changes. It cannot, they claim, be used for business-changing disruptive innovation. The critics are wrong. They are confusing the principles underlying A/B testing with the commons implementations of the idea.
How We Acquire Knowledge
There are basically three means by which we come to acquire knowledge:
- By appealing to authority.
- By constructing statements consistent with assumed first principles.
- By making observations on the effects of actions.
The third method has proven to be vastly superior when studying the natural world, and is the basis of the method known as science. If you are reading this then you are validating the efficacy of this method, as the computer you are using is the result of a few hundred years of scientific developments.
The primary mechanism of science is the experiment. An experiment involves performing some action in the world and measuring it’s effect. If different actions leads to different outcomes one typically does some statistical analysis on the result, to determine if one is justified in believing the differences represent a true difference or are just the result of chance.
A/B Is Science
A/B testing is science. A/B testing is about taking an action and measuring its effects. That is, doing an experiment. One can experiment with small things, like the colour of a button on a web site. One can also experiment with large things, like business models, new technology, and other disruptive changes.
The critics see the small experiments used to market A/B testing to internet businesses and think it is the totality of the method. They are right that companies usually don’t A/B test large changes. It is unusual to run two or more different business models, for example. That doesn’t mean these experiments aren’t done, but they are typically done at the level of the market rather than the individual company. Different companies, called competitors, experiment with a particular combination of strategy, model, and implementation, and the market measures their effect. Sometimes big companies will run these experiments internally. Google, for example, is currently experimenting with both Android and Chrome OS in more or less the same space. Complex experiments like this aren’t controllable nor are they repeatable, so the methods of social science are preferred over those of the hard sciences, but they still fall within the scientific paradigm.
A/B Testing Isn’t All That
I’ve said A/B testing is science, and science is great. However I do think the current implementation of A/B testing, as used by web companies, is flawed. The reason is we’re usually interested in decision making not hypothesis testing, and with decision making we want a different setup than is currently used. Exploring this is for another post.