Playing with Toys¶

As of v0.6.0, pyhf now supports toys! A lot of kinks have been discovered and worked out and we’re grateful to our ATLAS colleagues for beta-testing this in the meantime. We don’t believe that there may not be any more bugs, but we feel confident that we can release the current implementation.

Simple Model (again)¶

import numpy as np
import pyhf

model = pyhf.simplemodels.uncorrelated_background(
    signal=[5.0, 10.0], bkg=[50.0, 60.0], bkg_uncertainty=[5.0, 12.0]
)

Hypothesis Testing (revisited)¶

And like in the first exercise we did, let’s refresh what the hypothesis test looked like using \(\tilde{q}_\mu\):

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [53.0, 65.0] + model.config.auxdata,
    model,
    test_stat="qtilde",
    return_expected_set=True,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")

      Observed CLs: 0.4957
Expected CLs(-2 σ): 0.0832
Expected CLs(-1 σ): 0.1828
Expected CLs( 0 σ): 0.3705
Expected CLs( 1 σ): 0.6437
Expected CLs( 2 σ): 0.8854

So the question is, does the asymptotic approximation hold in this example? The standard assumption is that you have enough “statistics” (meaning enough events) to use the large-N approximation. So let’s use the toy-based calculator instead and compute the same values as above and see if they match in the asymptotic case (we certainly hope they mostly do here!)

Note

For readability in the Jupyter Book the hypotest has track_progress=False. If you’re running this notebook yourself you might want to set track_progress=True to enable the progress bar.

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [53.0, 65.0] + model.config.auxdata,
    model,
    test_stat="qtilde",
    return_expected_set=True,
    calctype="toybased",
    ntoys=1000,
    track_progress=False,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 CLs_obs, CLs_exp = pyhf.infer.hypotest(
   1.0,  # null hypothesis
   [53.0, 65.0] + model.config.auxdata,
   model,
   test_stat="qtilde",
   return_expected_set=True,
   calctype="toybased",
   ntoys=1000,
   track_progress=False,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/pyhf/infer/__init__.py:160, in hypotest(poi_test, data, pdf, init_pars, par_bounds, fixed_params, calctype, return_tail_probs, return_expected, return_expected_set, **kwargs)
calc = utils.create_calculator(
   calctype,
   data,
   (...)
   **kwargs,
)
teststat = calc.teststatistic(poi_test)
--> 160 sig_plus_bkg_distribution, bkg_only_distribution = calc.distributions(poi_test)
tb, _ = get_backend()
CLsb_obs, CLb_obs, CLs_obs = tuple(
   tb.astensor(pvalue)
   for pvalue in calc.pvalues(
       teststat, sig_plus_bkg_distribution, bkg_only_distribution
   )
)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/pyhf/infer/calculators.py:739, in ToyCalculator.distributions(self, poi_test, track_progress)
bkg_teststat = []
for sample in tqdm.tqdm(bkg_sample, **tqdm_options, desc='Background-like'):
   bkg_teststat.append(
--> 739         teststat_func(
           poi_test,
           sample,
           self.pdf,
           self.init_pars,
           self.par_bounds,
           self.fixed_params,
       )
   )
s_plus_b = EmpiricalDistribution(tensorlib.astensor(signal_teststat))
b_only = EmpiricalDistribution(tensorlib.astensor(bkg_teststat))

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/pyhf/infer/test_statistics.py:189, in qmu_tilde(mu, data, pdf, init_pars, par_bounds, fixed_params)
if par_bounds[pdf.config.poi_index][0] != 0:
   log.warning(
       'qmu_tilde test statistic used for fit configuration with POI not bounded at zero.\n'
       + 'Use the qmu test statistic (pyhf.infer.test_statistics.qmu) instead.'
   )
--> 189 return _qmu_like(mu, data, pdf, init_pars, par_bounds, fixed_params)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/pyhf/infer/test_statistics.py:25, in _qmu_like(mu, data, pdf, init_pars, par_bounds, fixed_params)
"""
Clipped version of _tmu_like where the returned test statistic
is 0 if muhat > 0 else tmu_like_stat.
   (...)
qmu_tilde. Otherwise this is qmu (no tilde).
"""
tensorlib, optimizer = get_backend()
---> 25 tmu_like_stat, (_, muhatbhat) = _tmu_like(
   mu, data, pdf, init_pars, par_bounds, fixed_params, return_fitted_pars=True
)
qmu_like_stat = tensorlib.where(
   muhatbhat[pdf.config.poi_index] > mu, tensorlib.astensor(0.0), tmu_like_stat
)
return qmu_like_stat

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/pyhf/infer/test_statistics.py:47, in _tmu_like(mu, data, pdf, init_pars, par_bounds, fixed_params, return_fitted_pars)
tensorlib, optimizer = get_backend()
mubhathat, fixed_poi_fit_lhood_val = fixed_poi_fit(
   mu, data, pdf, init_pars, par_bounds, fixed_params, return_fitted_val=True
)
---> 47 muhatbhat, unconstrained_fit_lhood_val = fit(
   data, pdf, init_pars, par_bounds, fixed_params, return_fitted_val=True
)
log_likelihood_ratio = fixed_poi_fit_lhood_val - unconstrained_fit_lhood_val
tmu_like_stat = tensorlib.astensor(
   tensorlib.clip(log_likelihood_ratio, 0.0, max_value=None)
)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/pyhf/infer/mle.py:131, in fit(data, pdf, init_pars, par_bounds, fixed_params, **kwargs)
# get fixed vals from the model
fixed_vals = [
   (index, init)
   for index, (init, is_fixed) in enumerate(zip(init_pars, fixed_params))
   if is_fixed
]
--> 131 return opt.minimize(
   twice_nll, data, pdf, init_pars, par_bounds, fixed_vals, **kwargs
)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/pyhf/optimize/mixins.py:184, in OptimizerMixin.minimize(self, objective, data, pdf, init_pars, par_bounds, fixed_vals, return_fitted_val, return_result_obj, return_uncertainties, return_correlations, do_grad, do_stitch, **kwargs)
       par_names[index] = None
   par_names = [name for name in par_names if name]
--> 184 result = self._internal_minimize(
   **minimizer_kwargs, options=kwargs, par_names=par_names
)
result = self._internal_postprocess(
   result, stitch_pars, return_uncertainties=return_uncertainties
)
_returns = [result.x]

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/pyhf/optimize/mixins.py:50, in OptimizerMixin._internal_minimize(self, func, x0, do_grad, bounds, fixed_vals, options, par_names)
def _internal_minimize(
   self,
   func,
   (...)
   par_names=None,
):
   minimizer = self._get_minimizer(
       func,
       x0,
   (...)
       par_names=par_names,
   )
---> 50     result = self._minimize(
       minimizer,
       func,
       x0,
       do_grad=do_grad,
       bounds=bounds,
       fixed_vals=fixed_vals,
       options=options,
   )
   try:
       assert result.success

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/pyhf/optimize/opt_scipy.py:93, in scipy_optimizer._minimize(self, minimizer, func, x0, do_grad, bounds, fixed_vals, options)
else:
   constraints = []
---> 93 return minimizer(
   func,
   x0,
   method=method,
   jac=do_grad,
   bounds=bounds,
   constraints=constraints,
   tol=tolerance,
   options=dict(maxiter=maxiter, disp=bool(verbose), **solver_options),
)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/scipy/optimize/_minimize.py:701, in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
   res = _minimize_cobyla(fun, x0, args, constraints, callback=callback,
                           **options)
elif meth == 'slsqp':
--> 701     res = _minimize_slsqp(fun, x0, args, jac, bounds,
                         constraints, callback=callback, **options)
elif meth == 'trust-constr':
   res = _minimize_trustregion_constr(fun, x0, args, jac, hess, hessp,
                                      bounds, constraints,
                                      callback=callback, **options)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/scipy/optimize/_slsqp_py.py:432, in _minimize_slsqp(func, x0, args, jac, bounds, constraints, maxiter, ftol, iprint, disp, eps, callback, finite_diff_rel_step, **unknown_options)
   c = _eval_constraint(x, cons)
if mode == -1:  # gradient evaluation required
--> 432     g = append(wrapped_grad(x), 0.0)
   a = _eval_con_normals(x, cons, la, n, m, meq, mieq)
if majiter > majiter_prev:
   # call callback if major iteration has incremented

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/scipy/optimize/_optimize.py:277, in _clip_x_for_func.<locals>.eval(x)
def eval(x):
   x = _check_clip_x(x, bounds)
--> 277     return func(x)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/scipy/optimize/_differentiable_functions.py:273, in ScalarFunction.grad(self, x)
if not np.array_equal(x, self.x):
   self._update_x_impl(x)
--> 273 self._update_grad()
return self.g

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/scipy/optimize/_differentiable_functions.py:256, in ScalarFunction._update_grad(self)
def _update_grad(self):
   if not self.g_updated:
--> 256         self._update_grad_impl()
       self.g_updated = True

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/scipy/optimize/_differentiable_functions.py:173, in ScalarFunction.__init__.<locals>.update_grad()
self._update_fun()
self.ngev += 1
--> 173 self.g = approx_derivative(fun_wrapped, self.x, f0=self.f,
                          **finite_diff_options)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/scipy/optimize/_numdiff.py:496, in approx_derivative(fun, x0, method, rel_step, abs_step, f0, bounds, sparsity, as_linear_operator, args, kwargs)
   h = np.where(dx == 0,
                _eps_for_method(x0.dtype, f0.dtype, method) *
                sign_x0 * np.maximum(1.0, np.abs(x0)),
                h)
if method == '2-point':
--> 496     h, use_one_sided = _adjust_scheme_to_bounds(
       x0, h, 1, '1-sided', lb, ub)
elif method == '3-point':
   h, use_one_sided = _adjust_scheme_to_bounds(
       x0, h, 1, '2-sided', lb, ub)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/scipy/optimize/_numdiff.py:50, in _adjust_scheme_to_bounds(x0, h, num_steps, scheme, lb, ub)
else:
   raise ValueError("`scheme` must be '1-sided' or '2-sided'.")
---> 50 if np.all((lb == -np.inf) & (ub == np.inf)):
   return h, use_one_sided
h_total = h * num_steps

File <__array_function__ internals>:180, in all(*args, **kwargs)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2489, in all(a, axis, out, keepdims, where)
@array_function_dispatch(_all_dispatcher)
def all(a, axis=None, out=None, keepdims=np._NoValue, *, where=np._NoValue):
   """
   Test whether all array elements along a given axis evaluate to True.

   (...)

   """
-> 2489     return _wrapreduction(a, np.logical_and, 'all', axis, None, out,
                         keepdims=keepdims, where=where)

File /__t/Python/3.9.13/x64/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
       else:
           return reduction(axis=axis, out=out, **passkwargs)
---> 86 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

KeyboardInterrupt: 

You’ll notice that this time, a progress bar pops up! This is running the fits for each of these toys. We’re hard at work to find new ways to improve the performance of this, but evaluating 50+ toys/s is not very bad for the initial implementation.

Overall, this is not so bad given that we’re only running 1000 toys. What does this look like in a case where we have lower statistics?

Hypothesis Testing (low stats)¶

model_low = pyhf.simplemodels.uncorrelated_background(
    signal=[0.5, 1.0], bkg=[5.0, 6.0], bkg_uncertainty=[0.5, 1.2]
)

In the asymptotics case:

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [5.0, 7.0] + model.config.auxdata,
    model,
    test_stat="qtilde",
    return_expected_set=True,
    calctype="asymptotics",
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")

And now throwing 2000 toys:

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [5.0, 7.0] + model.config.auxdata,
    model,
    test_stat="qtilde",
    return_expected_set=True,
    calctype="toybased",
    ntoys=1000,
    track_progress=False,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")

And as you can see in the case of lower statistics, the asymptotic approximation starts failing!

pyhf Tutorial

Playing with Toys

Contents

Playing with Toys¶

Simple Model (again)¶

Hypothesis Testing (revisited)¶

Hypothesis Testing (low stats)¶