English: Visualization of Thompson sampling in a simulated simplified context. We want to evaluate different treatment efficacies (our unknowns) in an efficient way. This is a case of basic multi-arm bandit problem. Outcome is simplified as either success of failure, and each treatment has its own (unknown to us) real probability of success (indicated by rotated squares). At each step, a patient comes in, and Thompson sampling is applied to choose which treatment to give. To that end: 1) for each treatment, a random number is picked following our current bayesian belief for that treatment's actual probability of success; 2) the treatment in which we picked the maximum of these random numbers is chosen (argmax) and applied; 3) once we get the result (success or failure), our belief is updated accordingly, and we can go to the next step.
The number below each treatment's rotated square represents the numbers of patients who received this treatment up until now. The more a treatment is applied, the less uncertainty we have about its efficacy (the distribution is "thinner").
We can see that here, Thompson sampling rapidly abandons the "bad" treatments and favors the good ones.
делиться произведением – копировать, распространять и передавать данное произведение
создавать производные – переделывать данное произведение
При соблюдении следующих условий:
атрибуция – Вы должны указать авторство, предоставить ссылку на лицензию и указать, внёс ли автор какие-либо изменения. Это можно сделать любым разумным способом, но не создавая впечатление, что лицензиат поддерживает вас или использование вами данного произведения.
https://creativecommons.org/licenses/by/4.0CC BY 4.0 Creative Commons Attribution 4.0 truetrue
Краткие подписи
Добавьте однострочное описание того, что собой представляет этот файл
Concrete example of Thompson sampling applied to simulate treatment efficacy evaluation.