Playing with Toys#

As of v0.6.0, pyhf now supports toys! A lot of kinks have been discovered and worked out and we’re grateful to our ATLAS colleagues for beta-testing this in the meantime. We don’t believe that there may not be any more bugs, but we feel confident that we can release the current implementation.

Simple Model (again)#

import numpy as np
import pyhf

model = pyhf.simplemodels.uncorrelated_background(
    signal=[5.0, 10.0], bkg=[50.0, 60.0], bkg_uncertainty=[5.0, 12.0]
)

Hypothesis Testing (revisited)#

And like in the first exercise we did, let’s refresh what the hypothesis test looked like using \(\tilde{q}_\mu\):

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [53.0, 65.0] + model.config.auxdata,
    model,
    test_stat="qtilde",
    return_expected_set=True,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")
      Observed CLs: 0.4957
Expected CLs(-2 σ): 0.0832
Expected CLs(-1 σ): 0.1828
Expected CLs( 0 σ): 0.3705
Expected CLs( 1 σ): 0.6437
Expected CLs( 2 σ): 0.8854

So the question is, does the asymptotic approximation hold in this example? The standard assumption is that you have enough “statistics” (meaning enough events) to use the large-N approximation. So let’s use the toy-based calculator instead and compute the same values as above and see if they match in the asymptotic case (we certainly hope they mostly do here!)

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [53.0, 65.0] + model.config.auxdata,
    model,
    test_stat="qtilde",
    return_expected_set=True,
    calctype="toybased",
    ntoys=1_000,
    track_progress=False,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")
      Observed CLs: 0.4957
Expected CLs(-2 σ): 0.0924
Expected CLs(-1 σ): 0.1821
Expected CLs( 0 σ): 0.3458
Expected CLs( 1 σ): 1.0000
Expected CLs( 2 σ): 1.0000

You’ll notice that this time, a progress bar pops up! This is running the fits for each of these toys. We’re hard at work to find new ways to improve the performance of this, but evaluating 50+ toys/s is not very bad for the initial implementation.

Overall, this is not so bad given that we’re only running 1000 toys. What does this look like in a case where we have lower statistics?

Hypothesis Testing (low stats)#

model_low = pyhf.simplemodels.uncorrelated_background(
    signal=[0.5, 1.0], bkg=[5.0, 6.0], bkg_uncertainty=[0.5, 1.2]
)

In the asymptotics case:

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [5.0, 7.0] + model_low.config.auxdata,
    model_low,
    test_stat="qtilde",
    return_expected_set=True,
    calctype="asymptotics",
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")
      Observed CLs: 0.7392
Expected CLs(-2 σ): 0.3520
Expected CLs(-1 σ): 0.5010
Expected CLs( 0 σ): 0.6829
Expected CLs( 1 σ): 0.8592
Expected CLs( 2 σ): 0.9662

And now throwing 5000 toys:

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [5.0, 7.0] + model_low.config.auxdata,
    model_low,
    test_stat="qtilde",
    return_expected_set=True,
    calctype="toybased",
    ntoys=5_000,
    track_progress=False,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")
      Observed CLs: 0.7391
Expected CLs(-2 σ): 0.4150
Expected CLs(-1 σ): 0.5072
Expected CLs( 0 σ): 0.6649
Expected CLs( 1 σ): 1.0000
Expected CLs( 2 σ): 1.0000

And as you can see in the case of lower statistics, the asymptotic approximation starts failing!