Playing with Toys - pyhf Tutorial

As of v0.6.0, pyhf now supports toys! A lot of kinks have been discovered and worked out and we’re grateful to our ATLAS colleagues for beta-testing this in the meantime. We don’t believe that there may not be any more bugs, but we feel confident that we can release the current implementation.

Simple Model (again)¶

import numpy as np
import pyhf

model = pyhf.simplemodels.uncorrelated_background(
    signal=[5.0, 10.0], bkg=[50.0, 60.0], bkg_uncertainty=[5.0, 12.0]
)

Hypothesis Testing (revisited)¶

And like in the first exercise we did, let’s refresh what the hypothesis test looked like using $\tilde{q}_\mu$ :

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [53.0, 65.0] + model.config.auxdata,
    model,
    test_stat="qtilde",
    return_expected_set=True,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")

So the question is, does the asymptotic approximation hold in this example? The standard assumption is that you have enough “statistics” (meaning enough events) to use the large-N approximation. So let’s use the toy-based calculator instead and compute the same values as above and see if they match in the asymptotic case (we certainly hope they mostly do here!)

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [53.0, 65.0] + model.config.auxdata,
    model,
    test_stat="qtilde",
    return_expected_set=True,
    calctype="toybased",
    ntoys=1_000,
    track_progress=False,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")

You’ll notice that this time, a progress bar pops up! This is running the fits for each of these toys. We’re hard at work to find new ways to improve the performance of this, but evaluating 50+ toys/s is not very bad for the initial implementation.

Overall, this is not so bad given that we’re only running 1000 toys. What does this look like in a case where we have lower statistics?

Hypothesis Testing (low stats)¶

model_low = pyhf.simplemodels.uncorrelated_background(
    signal=[0.5, 1.0], bkg=[5.0, 6.0], bkg_uncertainty=[0.5, 1.2]
)

In the asymptotics case:

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [5.0, 7.0] + model_low.config.auxdata,
    model_low,
    test_stat="qtilde",
    return_expected_set=True,
    calctype="asymptotics",
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")

And now throwing 5000 toys:

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [5.0, 7.0] + model_low.config.auxdata,
    model_low,
    test_stat="qtilde",
    return_expected_set=True,
    calctype="toybased",
    ntoys=5_000,
    track_progress=False,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")

And as you can see in the case of lower statistics, the asymptotic approximation starts failing!