Using Scientific Method to Craft Online Tests

February 26, 2014
Using Scientific Method to Craft Online Tests

Online testing is all the rage these days. Everywhere you look in the tech space there are articles and blog posts written about the need to test our online work. The value of testing is unquestionable, but with all this hype, it could be easy to think that you should test everything.

To understand what to test, you first need to know how to test. Scientific testing (think back to high school science) follows a specific method.

There are four basic elements that need to be a part of every test you run. Observations, hypotheses, predictions and experiments. Not only are these elements important, but the order in which they occur is critical.

Make Observations

Observation of existing phenomena is the starting point of any test. Comparisons across similar sites to see how design elements are similar or different. Reviewing analytics to see how users behave in an existing environment. Finding published research that others have conducted which can point you in the right direction.

If you don't have a sense of what is already out there, how can you possibly predict what to change to make it better? Observation gives us the basis to begin generating ideas on why a phenomenon exists. You can start to think about what might be an improvement over what you've seen, or you could decide that the way things are seems logical.

Here's a fictional web observation. You observe that call to action (CTA) buttons often use a bright colour. Specifically, you have done a measured review of 100 relevant sites and found that 70% of the sites used a red or orange CTA button.

Write Your Hypothesis

After you've observed a phenomenon, you can take your ideas on the subject and craft your hypothesis. This is a statement that presents your explanation for the phenomenon you've observed. Keep your hypothesis concise and clear, it is important for the next steps of the process.

You've observed the use of red and orange in a significant number of CTA buttons. Does this mean that a red or orange button is better for conversion?

Without more data, the information you have is not enough to be sure. You'll need to develop a hypothesis that you can test against. Here is a possible hypothesis you may develop based on your observation.

"If you use red for a CTA button, you will achieve a higher number of clicks on that button than if using green."

Note that you want to make your hypothesis as specific as possible. When you get to designing the test or experiment, you need to know exactly how to prove or disprove your hypothesis.

Make a Prediction

Having your hypothesis is not enough to jump straight into testing. You want to be sure that the results of your test clearly show whether you were correct or not. Your hypothesis statement should have enough detail to allow you to make a specific prediction.

For your CTA hypothesis, you predict that the red button will receive 50% more clicks than a green button.

You should also decide what you are going to do with your results before you start your experiment. If you are right, and the results for red are 75% higher than for green, the decision is easy — use red. What will you do if red only performs 25% better or 10% better? Will you refine your hypothesis or predictions and run further tests? Is there a minimum threshold of performance that you'll accept as enough?

The Experiment

When it comes to testing your hypothesis, there are likely to be several options. Because you have made specific predictions, the elements of your test should be well defined.

We need a green button and a red button and a means to measure how many times each button is clicked. Okay, it's a little more complicated than that, but not much more.

In this particular case, your best testing option is likely an A/B test with a large enough sample size or traffic set. Let's lay out your experiment.

Traffic to a specific URL will be directed to one of two page options. All other elements on the page being equal, page A will use a red button for the primary CTA, while page B will use a green button. Over a period of 10 days, the number of clicks on the CTA on each of these pages will be counted. At the end of 10 days, the numbers will determine which CTA received a higher number of clicks as a percentage of traffic to that page. A comparison of these numbers will confirm whether the red button received 50% more clicks than green.

When designing your experiment, it is critical to ensure that you only change the elements that you are testing for. Don't introduce factors that will muddy the results. Keep it simple and your results will be much less open to interpretation.

Keep Testing

One test will not be enough. You might find that you were right and that red is better than green. Is red also better than blue? Is colour even the issue, or is it contrast? Each experiment should prove or disprove the hypothesis for which it was created, but it won't unlock the secrets of the universe. New questions will arise, you'll observe other related phenomena, a different perspective will arise. Keep curious and keep testing.

Want to keep reading, check out these posts:

71 Things to A/B Test from Robin Johnson at

Why Testing Everything Doesn’t Work from Teresa Torres

If you’d like to get in touch, please email Thorren and let him know how we can help with your project or question.

Thorren Koopmans
Thorren has over a decade of experience managing web and marketing projects. His background in retail, finance and technology—and a long history with computers, web and social media—provide a well-rounded and innovative perspective on strategy development and implementation.
No items found.