MAB (Multi-Armed Bandit) Test

circle-check

What is MAB (Multi-Armed Bandit)?

It is about choosing the best option among multiple test groups with limited resources (time, traffic). For example, think of a situation where you need to decide which slot machine to bet on among several options. In this case, the MAB algorithm evaluates the reward for each option and adjusts the selection ratio of individual machines over time using reward information to find the optimal choice.

This helps the algorithm gradually find better choices and maximize rewards. It is used in various fields such as online advertising, click-through rate optimization, and recommendation algorithm optimization because it explicitly optimizes success metrics and enables faster decision-making compared to traditional A/B Tests.

How is MAB different from an A/B Test?

  1. An A/B Test collects multiple goal metrics and their statistical values (p-value or Bayesian probability) over a period of time, and makes decisions through an analysis (interpretation) process. MAB, on the other hand, is only concerned with maximizing a single success metric (conversion rate, CTR, etc.) and automatically adjusts traffic to maximize it, so there is no analysis (interpretation) process. The main goal of MAB is to answer "Which test group shows the greatest reward (= success metric optimization)?"

  2. MAB does not require a control group. Therefore, it does not provide statistical values like p-value or the probability of outperforming Group A.

  3. MAB is suitable for maximizing conversions for short and temporary experiences where changes are not permanent (e.g., promotional offers, headline tests, webinar registration pages). It is also recommended when continuous optimization is needed, such as for algorithm tests like search and recommendation logic.

What algorithm does Hackle's MAB use?

Hackle's MAB uses the Thompson Sampling (Bayesian) method. For n hours after MAB starts, traffic is distributed evenly across all test groups. After that, Thompson Sampling is used on an hourly basis to estimate the probability that each test group is the best, and traffic is allocated proportionally.

For example, if there are 3 test groups A, B, and C, and the probability that each group is the best is 70%, 20%, and 10%, traffic is allocated to each group in those proportions.

Last updated