Weird Science

Simply put, we argue a lot. Live data tests are a great way to collect quantitative data about how users interact with a web site or product. They help us move forward, incrementally and objectively--so we can argue less.

Testing can be a great way to collect data and insight from real users. Process varies greatly, but for the most part, there are two categories: qualitative and quantitative. Qualitative testing usually involves discussing designs or prototypes with a small group of users in order to get a wide range of info about how the users understand the design and how it will likely be used. Quantitative, or live data testing involves putting designs or prototypes in front of many users to collect data about how a statistically significant sample of users actually interact with the design. That's the subject of this article. There are many factors in creating designs and collecting data for live data tests. Here are five key steps.

1. Set an objective

It's important to be super clear about what it is you're trying to achieve

First things first: establish a goal. This seems obvious but it's not. Most of the time we drive initiatives forward based on complex plans to achieve a variety of goals. Sometimes (eh, maybe a lot of the time), those goals might even compete with each other. So when you're trying to drive change on a particular initiative, it's important to be super clear about what it is you're trying to achieve.

In order to understand how to establish an objective, you should understand how progress is tracked. Each initiative starts off with a high level goal, which will trickle down to specific event metrics that are collected and tracked in the wild. While a goal is overarching, an objective is specific. For example, if the goal is to get more customers, the objective is to increase daily users of the store site by 8% by Q2. To measure store users, a key performance indicator (or KPI) is established, in this case, daily shopping cart entries, which is determined by a metric of users who successfully add an item to the cart. This is determined by a set of events such as clicking the "add to cart" button, or visits to a "step 2" checkout page, etc.

Though it may seem like a pain at the beginning of the design process to think this through, it's a good idea. If you can't break down some kind of quantifiable goal at the beginning of the process, you might want to question the initiative. It's surprising how much this actually occurs, so make sure to take the time and establish the metric that you will use to determine whether something is a success or failure.

2. Design a test

When dealing with live data tests, you'll want to develop multiple concurrent designs that you will test against one another, in order to track the performance, and ultimately determine which designs yield the best results. You'll use web-based tools to route a small percentage of users into test buckets, or alternate design iterations. The performance of each design iteration is tracked and compared, which determines winners and losers.

Test versions should be decently fleshed out, but keep in mind that they're still just prototypes that will only be exposed to a small group of users, so don't go overboard. The main purpose of the test is to try out different ideas that might drive conversions for the objectives you've established (ahem, see the previous step...). Once again, it's important to be super specific. What do you want the user to click? What does success look like? Is it an email? Is it a download? Is it a location you want the user to reach? How can you give the user queues to complete the target action? What different options can you run against each other to test a theory about how something might perform? Different hypotheses might require different types of tests, so keep that in mind when you're figuring out what you want to learn.

There are a few different test types to choose from when using the commonly available web-based testing tools. A/B Split tests (or usually just called A/B tests) are when you test one design against another, such as the same layout and different button color. It's best to keep variations to a minimum with A/B splits in order to hone in on what works and what doesn't. Another type of test is called a multi-variate test. That means, you can test multiple variables, that the tool puts together automatically in different combinations, to let you know which is the winning combination. That helps if you have several interchangeable variables that just need to be set up in the right combination. The final is a multi-paged test which is when you test one set of pages against another. That means you can test funnel vs funnel, each with its own unique flow, tracking common goals between multiple pages.

3. Organize your data

Your prototype is basically going to be an event tracking machine

It's super important that before you jump in and build your prototypes, you've got a clear and streamlined event tracking plan. You'll probably want to use tools such as Google Analytics or Optimizely. Those tools are great, and do a fine job with event tracking, but don't underestimate the power of a simple, well-organized spreadsheet, that can be used to map and track variations and the data that goes along with each one. It's very un-glamorous, but a spreadsheet allows you to maintain a decent grip on the entire process, and comes in handy if you have rouge stats such as emails collected in our MailChimp account, or something like that. It also helps you share the information with other members of the team, especially the Data Science team if you have one.

Your prototype is basically going to be an event tracking machine. So make sure you have an established schema. Depending on which type of event tracking you are using, it usually involves a tag or string of text used to mark specific events. If you have clicks on a particular link or button that you want to track, you can establish a specific string of text that will be unique to that particular event. Every time that link or button is clicked, the analytics software will add it to a tally associated with that particular identification tag. Shorthand codes can be established for bits of information such as Page Name, Content Area, and Action Type, which can be strung together in each custom event tracking tag. That way, event information is condensed into a single tag, and other tag information such as the category and label can be used to help catalog and archive multiple experiments. Most event tracking tools, like Google Analytics, are designed for production environments, so you might have to use some of the features differently to track and catalog a series of tests. It's ok to get creative and figure out what works best for you.

4. Build and run the test

When you've figured out what it is you want to test and how you plan to track the data, you'll need to decide what tool you will use to run the test. Tools such as Optimizely will actually facilitate your tests, and re-route traffic to the different design options and will provide real-time statistics about the performance of each one. In some cases, you can use the tool's built-in WYSIWYG editor to generate your design variations. It's also possible for you to use your own web servers to host your own design variations, using the test tool to simply route the traffic.

My favorite combo is to host static design variations on a web server, use Optimizely to facilitate the test, and also collect a whole range of information using Google Analytics. I'll usually use Optimizely to route the traffic from production to the test buckets and to track all the primary conversion events, and at the same time, use Google Analytics to track and archive all clicks, visits, and conversions that occur. That way, I have one tool facilitating the test, and another tool for keeping a record of the test. Since the tools collect data in slightly different ways, it's handy to have multiple sources of information. They seldom add up exactly the same, so it's good to compare and factor in any overlap or discrepancies. Also since the iterative testing process can be both complicated and fast-paced, a little redundancy can be a good thing, in case you end up with a mistake, glitch or have missing information. As I mentioned before, a well-organized spreadsheet can be invaluable when you're tracking multiple metrics with a variety of tools.

While your test prototype may looked polished, its main purpose is not to be perfect, but to track events. Like I said, your prototype is an event tracking machine. That means, it should be built in a way that is easy to spin off new versions and new variations. It should also be embedded with custom event tracking scripts that will provide specific data about each of the events. Methods like modular grid systems and UI components are useful tools in designing prototypes for rapid iteration. Also, it might help to use custom variables on event tracking scripts so that you can re-use code snippets and not have to update each one individually.

While the user might not notice it's a prototype, this is something that you should avoid releasing to 100% of production. It should be something you use, temporarily, to collect data about what works and what doesn't. When it comes time to building the final production version, much more attention should be given to the performance and production scalability.

5. Tell the truth

Testing is a risk management practice

Once you've run your test and have data to review, you'll want to check and review the data. If you were able to collect event information with more than one tool, you can compare the redundant information to make sure everything is consistent. Next, you'll want to make sure the data is organized coherently in a format that can be shared. Statistical analysis is great if it's an option for you or your team. Basically, testing is a risk management practice, so you want to make sure that any results you share have been vetted by appropriate individuals using appropriate methods. Statistical pitfalls and common mistakes include: too many variations, timing and duration of tests, scaling to production, test/production data mismatches, the list goes on. Just do your best and recognize that it's not an exact science.

Remember, information collected in live data tests is directional. It helps provide insight about how things might perform in production. Live data tests are best thought of in terms of risk management. They are not perfect at pinpointing exactly what will happen in production, but are sometimes better at determining what options you should rule out. Each live data test should be based on a clear objective, aligned with an overarching goal. Hopefully, live data tests will help you settle debates, understand your users, and get a sense of what works and what doesn't.