A/B testing is all the rage in
certain web development circles. Naturally, when something
becomes popular the criticism starts. I’ve read some
unconvincing attacks on A/B testing recently, as well as
some good ones, so I want to lay down my thoughts on what
A/B testing is and what it isn’t.
The general method of A/B testing on the web is as
follows:
-
Decide on a change to make to the site. This could be as
small as the wording of a title or as large as the
entire navigational structure of the site.
-
Decide what outcome you want to measure. Typical
examples are purchases, time spent on the site, and
number of repeat visits.
-
Randomly assign each visitor one of the two (or more)
versions of the site.
-
Measure how the different versions stack up against the
outcome of interest.
This is a fairly simple thing. Critics of A/B testing
usually claim that it is only good for small changes. It
cannot, they claim, be used for business-changing disruptive innovation. The critics are wrong. They are confusing the
principles underlying A/B testing with the commons
implementations of the idea.
How We Acquire Knowledge
There are basically three means by which we come to
acquire knowledge:
- By appealing to authority.
-
By constructing statements consistent with assumed first
principles.
- By making observations on the effects of actions.
The third method has proven to be vastly superior when
studying the natural world, and is the basis of the method
known as science. If you are reading this then you are
validating the efficacy of this method, as the computer
you are using is the result of a few hundred years of
scientific developments.
The primary mechanism of science is the experiment. An
experiment involves performing some action in the world
and measuring it’s effect. If different actions leads to
different outcomes one typically does some statistical
analysis on the result, to determine if one is justified
in believing the differences represent a true difference
or are just the result of chance.
A/B Is Science
A/B testing is science. A/B testing is about taking an
action and measuring its effects. That is, doing an
experiment. One can experiment with small things, like the
colour of a button on a web site. One can also experiment
with large things, like business models, new technology,
and other disruptive changes.
The critics see the small experiments used to market A/B
testing to internet businesses and think it is the
totality of the method. They are right that companies
usually don’t A/B test large changes. It is unusual to run
two or more different business models, for example. That
doesn’t mean these experiments aren’t done, but they are
typically done at the level of the market rather than the
individual company. Different companies, called
competitors, experiment with a particular combination of
strategy, model, and implementation, and the market
measures their effect. Sometimes big companies will run
these experiments internally. Google, for example, is
currently experimenting with both Android and Chrome OS in
more or less the same space. Complex experiments like this
aren’t controllable nor are they repeatable, so the
methods of social science are preferred over those of the
hard sciences, but they still fall within the scientific
paradigm.
A/B Testing Isn’t All That
I’ve said A/B testing is science, and science is great.
However I do think the current implementation of A/B
testing, as used by web companies, is flawed. The reason
is we’re usually interested in decision making not
hypothesis testing, and with decision making we want a
different setup than is currently used. Exploring this is
for another post.