Marketing is science, an iterative process that involves testing and one of the common practices in testing is the A/B test. But unfortunately, over 90% of AB tests that are conducted, are done with the right intention but the wrong method.
Typically this has got to do with Type1 and Type 2 errors with Type1 being the most common. In the case of Type 1 error (or false positive error), we tend to declare the variant has won the test (with better performance) but in reality, that variant is falsely declared as the winner. This could be of grave consequences as decisions based on type 1 error can lead the company in the wrong direction assuming they are on the right track.
In the type 2 error, (or false negative error), we fail to reject the null hypothesis that there is no difference between both A and B variants while in reality there is a significant difference.
Modus operandi
The basis of any AB test is a hypothesis that we test against. A hypothesis in simple terms is an assumption that we challenge with data coming out of the test.
There are 2 hypotheses to a test, a null hypothesis that assumes that there is no significant difference between A and B variants and an alternative hypothesis that challenges the null hypothesis and postulates that there is indeed a significant difference between both versions.
There are 3 reasons why you might be implementing erroneous AB testing;
Sample Size
Significance level and statistical power
Business cycles
Sample Size:
The most common mistake when one conducts an AB test is to account for the sample size. This is the minimum number of samples required per variant to be able to see any significant difference in performance between the variants. This is part of the pre-test analysis that we do before we start the AB testing.
There are a lot of calculators (including https://cxl.com/ab-test-calculator/) which might help you determine the ideal sample size per variant and help plan the number of days the AB test will run, depending on your website traffic and the Minimum Detectable Effect (MDE) you want for the test (usually 10%).
Significance Level and Statistical Power:
Once you know the minimum samples required per variant, the significance level (usually set at 95%), along with the statistical power (usually set at 80%) decreases the probability of Type 1 and Type 2 errors respectively.
The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis (i.e. there is no significant difference between variant and control) when it is true.
In simple terms, when we set the significance level at 95% we are making sure, 95% of the times when we are choosing a winner between A and B, that’s actually a winner and not by chance. If the p-value of your test is more than 0.05 then you accept the null hypothesis that there is no significant difference between the 2 variants and vice-versa.
Statistical power on the other hand is the probability of correctly rejecting the null hypothesis when it is false. When we set it at 80% we are accepting a 20% chance that the test might miss out on a winner and say it is not a winner.
If we go beyond 95% and 80% for significance level and statistical power, we will require a lot more samples for the test thus we keep these numbers on these generally acceptable values.
Business cycle:
Last but certainly not least is the most overlooked component in AB tests. Whenever we are running an AB test, it is best to run any test for at least 2 micro-business cycles (typically business cycles are for 1 week period). This is because when we run an AB test for let’s say a week then the results might be affected by some external factors (eg: some special offer from competitors). So if you have more sales on weekdays than weekends or vice-versa then your business has a micro business cycle of a week.
Also, it is always preferred to keep in mind the average time a user takes to transact on your website and run the test for at least twice that period. This will reduce the chances of the results getting skewed.
Eg: if a typical user takes 3 days to transact and you are running the test for 4 days then the person who came on the third day on a variant might convert after your test is done and you might falsely analyze your results.
Once you have set the hypothesis and taken care of the 3 factors mentioned above, you can run your AB test and see which variant performs the best using the test analysis https://cxl.com/ab-test-calculator/.
If you use optimize to distribute the traffic and google analytics to track the website activity then you should be able to create a new segment using the experiment and the variant id in Google analytics. This should help you divide your website traffic into control and test variant segments.
You can access the experiment id along with the variant id from Google Optimize under the details of an experiment.
It is also preferred to use the anti-flicker snippet to avoid the page flickering before the right variant is loaded.
Once you have the segments created inside Google Analytics, you can use these segments to track different cohorts and measure your AB test results against these. These could be related to a lot of different dimensions but the most common are devices, channels, and geographies. Cutting your AB test results against these cohorts should give you a better insight into your results and make the final decision more informed.
Most often if the results are inconclusive (p-value more than 0.05) there could be hidden information when you segment your AB test result data along with different cohorts which might influence your decision of the test.
Whenever you are inculcating the habit of doing tests, you can test a major business hypothesis (eg: will increase in brand trust increase conversions?) and convert that into a series of tests including adding social proof on the landing pages or showing ratings of users to validate your initial business hypothesis.
Before you start your AB testing it is always advisable to do an A/A test where there is no difference between the variant and the control versions just to see if at different stages of the test you could have declared a winner. The p-value of an A/A test should be more than 0.05 in the long run and should ultimately become constant. If that’s not the case you should dwell deeper into the analytics tool and your AB testing methodology to uncover where the problem is. Also, relying too much on any AB testing tool is not preferred as those tools are configured to give you a sense that you are getting winners in tests while in reality that might not be the case. Always trust your business instincts more than any tool in the market.
Lastly, if you feel there is an AB test that has resulted in no significant difference in the A and B variants (accepting null hypothesis), you could go ahead with your preference of the variant as statistically there is no difference so you might as well satisfy your liking. So if the new version of the home page looks aesthetically better but has no significant better conversions, you can go still implement the new version.
In 2019, global email users amounted to 3.9 billion users (Statista, 2020). This figure is set to grow to 4.3 billion users in 2023. That’s half of the world’s population. With such a ubiquitous channel for communication, companies rely heavily on email marketing to grow sales in the pursuit of cheaper conversions. But off late email marketing […]
Be it your assignment in your job application or a client pitch for your agency, if there is one thing people ask for is a Digital Marketing Plan. I am not going to bother you with free templates to design your plan, instead what I am going to focus on is the way to think […]
After reading this research article, you would definitely doubt the effectiveness of the most popular social media platform and yes you guessed it right, it’s Facebook. With its most legitimate revenue coming from ads-based revenue but have you ever pondered over the authenticity of the entire model? While one could easily increase its reach on […]
If you are doing paid media marketing, chances are you must be heavily relying on Facebook to fuel your marketing ROI. While ads and landing pages are a good place to start optimising your Return On Ad Spends (ROAS), nothing is more important than targeting the right audience. You can’t expect to sell dog food […]
If you are working in a niche market, search networks on AdWords and bing are bound to give you the most relevant traffic on your website. Here you have control over what search terms to target and optimize for long-tail keywords that have higher conversion rates. This scenario might sound like an SEO problem but […]
When it comes to multi-channel attribution models, the 2 most common models are Markov and Shapley models. While there are plenty of blogs talking about these models along with the Python libraries to implement them, it was a struggle to find resources that talk about these models, not to a data scientist but a marketer. […]
The World Wide Web has evolved multi folds over the last decade and to sell online is no longer just about having a website. With over 1.5 billion websites, businesses have little evolved to capture that attention and sell their products the way it works. Everything starts with a sales funnel. I am not going […]
Since the dawn of the internet age and the invention of the popular mailing service HotMail, one of the most profitable and of course misused marketing channels is email. If you are a marketer living in India you are fortunate enough to have the most relaxed rules when it comes to email marketing. Refer to […]
When it comes to forecasting sales, most marketers rely on a simple function in excel, sometimes their boss’s fancy or even wild conjectures. While realistic forecasts are hard to put together, marketers can leverage pre-built machine learning regression models to their use. This now only gives realistic and pretty accurate predictions about your sales and […]
It was the morning after I posted a giveaway on Linkedin for marketing spreadsheet utilities I had developed while I was running my own agency. What I saw the next day was something I hadn’t expected at all. 2500+ comments 700+ Likes and 300+ connection requests The technique was simple, get people to comment in […]