import json
import tarfile

import matplotlib.pyplot as plt
import numpy as np
import pyhf
import pyhf.contrib.utils
import scipy
from pyhf.contrib.viz import brazil

pyhf.set_backend("numpy", "minuit")

from rich import box
from rich.console import Console
from rich.table import Table

Model-Independent Upper Limits#

This section is aiming to cover some common concepts about (model-independent) upper limits. While it won’t necessarily be exhaustive, the aim is to summarize the current working knowledge of how High Energy Physics experiments such as ATLAS produce tables such as Table 20 from the Supersymmetry Electroweak 2-lepton, 2-jet analysis (SUSY-2018-05):

Table 20: Model-independent upper limits on the observed visible cross-section in the five electroweak search discovery regions, derived using pseudo-experiments. Left to right: background-only model post-fit total expected background, with the combined statistical and systematic uncertainties; observed data; 95 CL upper limits on the visible cross-section ($\langle A\varepsilon\sigma\rangle_\mathrm{obs}^{95}$) and on the number of signal events ($\mathrm{S}_\mathrm{obs}^{95}$). The sixth column ($\mathrm{S}_\mathrm{exp}^{95}$) shows the expected 95 CL upper limit on the number of signal events, given the expected number (and $\pm1\sigma$ excursions of the expectation) of background events. The last two columns indicate the confidence level of the background-only hypothesis ($\mathrm{CL}_\mathrm{b}$) and discovery $p$-value with the corresponding Gaussian significance ($Z(s=0)$). $\mathrm{CL}_\mathrm{b}$ provides a measure of compatibility of the observed data with the signal strength hypothesis at the 95 CL limit relative to fluctuations of the background, and $p(s=0)$ measures compatibility of the observed data with the background-only hypothesis relative to fluctuations of the background. The $p$-value is capped at 0.5.

Here, the ATLAS collaboration provided some text describing each of the individual columns. How does that translate to statistical fits and pyhf? Let’s explore below. We will use the public probability models from the analysis published on HEPData entry.

Downloading the public probability models#

First, we’ll use pyhf.contrib to download the models from HEPData:

pyhf.contrib.utils.download(
    "https://doi.org/10.17182/hepdata.116034.v1/r34", "workspace_2L2J"
)

Then use Python’s tarfile (part of standard library) to extract out the JSONs:

with tarfile.open("workspace_2L2J/ewk_discovery_bkgonly.json.tgz") as fp:
    bkgonly = pyhf.Workspace(json.load(fp.extractfile("ewk_discovery_bkgonly.json")))

with tarfile.open("workspace_2L2J/ewk_discovery_patchset.json.tgz") as fp:
    patchset = pyhf.PatchSet(json.load(fp.extractfile("ewk_discovery_patchset.json")))

and then build our workspace for a particular signal region DR-Int-EWK:

ws = patchset.apply(bkgonly, "DR_Int_EWK")

model = ws.model()

Understanding expected and observed events#

The background estimate(s) in the signal region(s) are obtained from the background-only fit to control region(s) extrapolated to the signal region(s).

This number, combined with the number of observed events in the signal region, allows us to redo the limit-setting or \(p\)-value determination without the public probability models.

For the uncertainties on the background estimate(s) in the signal region(s), they’re symmetric and calculated through standard error propagation described below.

Observed Events#

The observed events are just ws.data(model) which we will call data. These are ordered in the same way as the ordering of the channels in the model (model.config.channels):

data = ws.data(model)
print(list(zip(model.config.channels, data)))
[('CRDY_cuts', 90.0), ('CRZZ_cuts', 194.0), ('CRZ_cuts', 159.0), ('CRtt_cuts', 424.0), ('DRInt_cuts', 38.0)]

Expected Events#

The expected events are determined from a background-only fit. This is done by fixing the parameter of interest (i.e. for a search the non-SM signal strength) to zero. This fit will give us the fitted parameters \(\hat{\theta}\). From this, we can print out the expected background events in each channel after this fit:

pars_uncrt_bonly, corr_bonly = pyhf.infer.mle.fixed_poi_fit(
    0.0, data, model, return_uncertainties=True, return_correlations=True
)
pars_bonly, uncrt_bonly = pars_uncrt_bonly.T

table = Table(title="Per-region yields from background-only fit", box=box.HORIZONTALS)
table.add_column("Region", justify="left", style="cyan", no_wrap=True)
table.add_column("Total Bkg.", justify="center")

for region, count in zip(model.config.channels, model.expected_actualdata(pars_bonly)):
    table.add_row(region, f"{count:0.2f}")

console = Console()
console.print(table)
  Per-region yields from   
    background-only fit    
 ───────────────────────── 
  Region       Total Bkg.  
 ───────────────────────── 
  CRDY_cuts      90.03     
  CRZZ_cuts      196.13    
  CRZ_cuts       159.00    
  CRtt_cuts      424.33    
  DRInt_cuts     35.54     
 ───────────────────────── 

The uncertainty on the total background (expected event rate) is done by linearly propagating errors on parameters using the covariance matrix from a fit result. The error is calculated in the same way that ROOT’s RooAbsReal::getPropagatedError() calculates it. It is summarized as follows

\[ \mathrm{error}^2(x) = F_\mathbf{a}(x) \cdot \mathrm{Corr}(\mathbf{a},\mathbf{a}') \cdot F_{\mathbf{a}'}^{\mathrm{T}}(x) \]

where

\[ F_\mathbf{a}(x) = \frac{ f(x, \mathbf{a} + \mathrm{d}\mathbf{a}) - f(x, \mathbf{a} - \mathrm{d}\mathbf{a}) }{2}, \]

with \(f(x)\) the probability model and \(\mathrm{d}\mathbf{a}\) the vector of one-sigma uncertainties of all fit parameters taken from the fit result and \(\mathrm{Corr}(\mathbf{a},\mathbf{a}')\) the correlation matrix from the fit result.

References:

Note: be aware that some definitions of error propagation use covariance and some use correlation. You can translate between the two, turning covariance into correlation by multiplying by \(\partial a\) on both sides. If one uses covariance, you’ll need information about the derivatives (e.g. through auto-differentiation).

This could be done with a for-loop like so:

up_yields = []
dn_yields = []
for i, variation in enumerate(uncrt_bonly):
    varied_up_pars = pars_bonly[:] # copy
    varied_up_pars[i] += variation
    
    varied_dn_pars = pars_bonly[:] # copy
    varied_dn_pars[i] -= variation

    up_yields.append(model.expected_actualdata(varied_up_pars))
    dn_yields.append(model.expected_actualdata(varied_dn_pars))

which gives us the up/down variations on the yields for each systematic. However, we’ll batch this calculation to calculate all yield variations simultaneously. To do this, we set up our model to allow for batching.

npars = len(model.config.parameters)
model_batch = ws.model(batch_size=npars)
pars_batch = np.tile(pars_bonly, (npars, 1))

Then we calculate the up_yields and dn_yields which are used to calculate the \(F_\mathbf{a}(x)\) term above, denoted as variations:

up_yields = model_batch.expected_actualdata(pars_batch + np.diag(uncrt_bonly))
dn_yields = model_batch.expected_actualdata(pars_batch - np.diag(uncrt_bonly))
variations = (up_yields - dn_yields) / 2

Now we can calculate \(\mathrm{error}^2(x)\):

error_sq = np.einsum("il,ik,kl->l", variations, corr_bonly, variations)
error_sq
array([ 89.50743818, 188.14236109, 158.73206103, 423.8485969 ,
        16.9340858 ])

Taking the square root will give us the propagated uncertainty on the yields per-region for the background-only fit. We can update our table:

table = Table(title="Per-region yields from background-only fit", box=box.HORIZONTALS)
table.add_column("Region", justify="left", style="cyan", no_wrap=True)
table.add_column("Total Bkg.", justify="center")

for region, count, uncrt in zip(
    model.config.channels, model.expected_actualdata(pars_bonly), np.sqrt(error_sq)
):
    table.add_row(region, f"{count:0.2f} ± {uncrt:0.2f}")

console = Console()
console.print(table)
    Per-region yields from     
      background-only fit      
 ───────────────────────────── 
  Region         Total Bkg.    
 ───────────────────────────── 
  CRDY_cuts     90.03 ± 9.46   
  CRZZ_cuts    196.13 ± 13.72  
  CRZ_cuts     159.00 ± 12.60  
  CRtt_cuts    424.33 ± 20.59  
  DRInt_cuts    35.54 ± 4.12   
 ───────────────────────────── 

Understanding visible cross-section#

See ATL-PHYS-PUB-2022-017 for descriptions of detector efficiency and selection acceptance.

Production cross section upper limits is another way of saying model-dependent limits. Without the detector efficiency (but keeping selection acceptance), the term visible cross section upper limits is used.

For upper limits on numbers of events (and the corresponding cross section), after detector efficiency and selection acceptance effects, the term model independent upper limits is used. These upper limits are typically free from assumptions about the acceptance and efficiency of any model (and are model independent in that sense), but they do make the assumption that signal contamination in control regions is negligible.

Model-independent Upper Limits#

Model-independent upper limits (UL) at the 95% confidence level (CL) on the number of beyond the SM (BSM) events for each signal region are derived using the \(\mathrm{CL}_\mathrm{s}\) prescription (DOI: 10.1088/0954-3899/28/10/313) and neglecting any possible contamination in the control regions.

To obtain the 95% CL model-independent UL / discovery \(p_0\)-value, the fit in the signal region proceeds in the same way as the background-only fit, except that an additional parameter for the non-SM signal strength, constrained to be non-negative, is fit.

pyhf.set_backend("numpy", "scipy")

mu_tests = np.linspace(15, 25, 10)
(
    obs_limit,
    exp_limit_and_bands,
    (poi_tests, tests),
) = pyhf.infer.intervals.upper_limits.upper_limit(
    data, model, scan=mu_tests, level=0.05, return_results=True
)
print(f"Observed UL: {obs_limit:0.2f}")
print(
    f"Expected UL: {exp_limit_and_bands[2]:0.2f} [{exp_limit_and_bands[1]-exp_limit_and_bands[2]:0.2f}, +{exp_limit_and_bands[3]-exp_limit_and_bands[2]:0.2f}]"
)
Observed UL: 20.15
Expected UL: 17.00 [-2.00, +6.33]

When calculating \(S^{95}_\mathrm{obs}\) (also for expected) it is typical to inject a test signal (all expected event rates equal to unity) which means for this signal, we’re assuming \(A\times\varepsilon\times\sigma\times\mathcal{L}=1\). As \(S^{95}_\mathrm{obs}\) is an upper limit on \(\mu\) for this test signal, normalizing this by the integrated luminosity of the data sample allows us to interpret as upper limits on the visible (BSM) cross-section, \(\langle A\varepsilon\sigma\rangle_\mathrm{obs}^{95}\). This is defined as the product of acceptance, reconstruction efficiency and production cross-section.

Remember that we often call this the visible cross-section, where cross-section is \(\sigma\) and the visible cross-section is \(A\times\varepsilon\times\sigma\). This is called a model independent limit as we do not have a specific model in mind (hence the test signal). We don’t have specific values for selection acceptance \(A\), detector efficiency \(\varepsilon\), or signal model cross-section \(\sigma\), so we can only say something about the product. If we have a specific signal model with \(A\varepsilon\sigma\) that gives us a cross-section higher than the limit on the visible cross-section, we should be able to see that signal in the analysis.

print(f"Limit on visible cross-section: {obs_limit / 140:0.2f} ifb")  # 140 ifb
Limit on visible cross-section: 0.14 ifb

Discovery \(p\)-values#

The significance of an excess can be quantified by the probability (\(p_0\)) that a background-only experiment is more signal-like than observed. The last two columns indicate the \(\mathrm{CL}_\mathrm{b}\) value and the discovery \(p\)-value (\(p(s)=0\)) with the corresponding gaussian significance (\(Z\)\(\,\)).

These two numbers provide information about the compatibility of the observed data with a background-only hypothesis, albeit with different test statistics \(q\). The \(q_\mu\) (or \(\tilde{q}_\mu\)) test statistic is optimized for hypothesis-testing the signal+background hypothesis. The \(q_0\) test statistic is optimized for hypothesis-testing the background-only hypothesis.

From Equation 52 in arXiv:1503.07622:

  • for \(p_\mu\): the null hypothesis is signal at a rate \(\mu' < \mu\) and the alternative hypothesis is signal at a rate of \(\mu\)

  • for \(p_0\): the null hypothesis is background-only (b) and alternative hypothesis is signal+background (s+b)

For these hypotheses, we find that the \(p\)-values are

\[ 1 - p_\mathrm{b} = P(q \geq q_\mathrm{obs} | \mathrm{b}) = \int_{q_\mathrm{obs}}^\infty f(q | \mathrm{b} )\ \mathrm{d}q \]

and

\[ p_\mathrm{s+b} = P(q \geq q_\mathrm{obs} | \mathrm{s+b}) = \int_{q_\mathrm{obs}}^\infty f(q | \mathrm{s+b} )\ \mathrm{d}q \]

Note that the same \(q_\mathrm{obs}\) value is used to determine \(p_\mu\) and \(p_b\).

\(p_\mu\) (and \(\mathrm{CL}_\mathrm{b}\))#

\(\mathrm{CL}_\mathrm{b}\) provides a measure of compatibility of the observed data with the 95% CL signal strength hypothesis relative to fluctuations of the background. \(\mathrm{CL}_\mathrm{b}\) is evaluating \(p_\mu\) in the case where \(\mu'=0\) and \(\mu=\mu_{95}\) (the fitted \(\mu\) at the upper limit \(S^{95}_\mathrm{obs}\)). This is saying that the tested \(\mu\) is the 95% CL signal strength (\(\mu_{95}\)) but the assumed true \(\mu\) is the background-only hypothesis (\(\mu'=0\)):

\[ \mathrm{CL}_\mathrm{b} \equiv 1 - p_b = \int_{q_{\mu_{95}}^{\ \mathrm{obs}}}^\infty f(q_\mu | \mu' = 0)\ \mathrm{d} q_\mu \]

Note that this calculation depends on the value of \(\mu\) at the fitted upper limit \(S^{95}_\mathrm{obs}\). You get a small \(\mathrm{CL}_\mathrm{b}\) if you have a downward fluctuation in observed data relative to your signal+background hypothesis, preventing you from excluding your signal erroneously if you “shouldn’t be able to” when using \(\mathrm{CL}_\mathrm{s}\) (since you would be dividing by a small number). See the paragraph above plot on page two in this document.

\(p_0\)#

\(p(s)=0\) measures compatibility of the observed data with the background-only (zero signal strength) hypothesis relative to fluctuations of the background. Larger values indicate greater relative compatibility. \(p(s)=0\) is not calculated in signal regions with a deficit with respect to the nominal background prediction.

This is saying that the tested \(\mu\) is the background-only hypothesis (\(\mu=\mu'=0\)) and the assumed true \(\mu\) is the background-only hypothesis:

\[ p_0 = P(q_0 \geq q_0^\mathrm{obs}) = \int_{q_{\mu_{95}}^{\ \mathrm{obs}}}^\infty f(q_{\mu=0} | \mu'=0)\ \mathrm{d} q_{\mu=0} \]

Relationship to \(\mathrm{CL}_\mathrm{s}\)#

This will be covered through a learn notebook, which will be added as part of scikit-hep/pyhf#2274.

Calculating the \(p\)-values#

Armed with all of this knowledge, we can first calculate the discovery \(p\)-value evaluated at \(\mu=\mu'=0\), as well as convert this to a Z-score assuming a “standard” Normal distribution:

p0 = pyhf.infer.hypotest(0.0, data, model, test_stat="q0")
p0_Z = scipy.stats.norm.isf(p0)
print(f"p(s)=0 (Z): {p0:0.2f} ({p0_Z:0.2f})")
p(s)=0 (Z): 0.24 (0.72)

And similarly, we can extract \(\mathrm{CL}_\mathrm{b}\) by evaluating at \(\mu=\mu_{95}^\mathrm{obs}\) :

_, (_, CLb) = pyhf.infer.hypotest(obs_limit, data, model, return_tail_probs=True)
print(f"CLb: {CLb:0.2f}")
CLb: 0.70

Making the table#

Now that we have all of our numbers, we can make the table for the discovery region DRInt_cuts. We’ll get the corresponding index of the channel so we can extract out the total background and uncertainty estimates from above.

region_mask = model.config.channels.index("DRInt_cuts")
bkg_count = model.expected_actualdata(pars_bonly)[region_mask]
bkg_uncrt = np.sqrt(error_sq)[region_mask]
table = Table(
    title="Model-independent upper limits on the observed visible cross-section in the five electroweak search discovery regions; derived using asymptotics, pyhf, and published likelihoods.",
    box=box.HORIZONTALS,
)
table.add_column("Signal Region", justify="left", style="cyan", no_wrap=True)
table.add_column("Total Bkg.", justify="center")
table.add_column("Data")
table.add_column(r"XSec \[fb]")
table.add_column("S 95 Obs")
table.add_column("S 95 Exp")
table.add_column("CLb(S 95 Obs)")
table.add_column("p(s=0) (Z)")

table.add_row(
    model.config.channels[region_mask],
    f"{bkg_count:0.0f} ± {bkg_uncrt:0.0f}",
    f"{data[region_mask]:0.0f}",
    f"{obs_limit / 140:0.2f}",
    f"{obs_limit:0.1f}",
    f"{exp_limit_and_bands[2]:0.1f} + {exp_limit_and_bands[3] - exp_limit_and_bands[2]:0.1f} - {exp_limit_and_bands[2] - exp_limit_and_bands[1]:0.1f}",
    f"{CLb:0.2f}",
    f"{p0:0.2f} ({p0_Z:0.1f})",
)

console = Console()
console.print(table)
    Model-independent upper limits on the observed visible cross-section in the five electroweak search     
               discovery regions; derived using asymptotics, pyhf, and published likelihoods.               
 ────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  Signal Region   Total Bkg.   Data   XSec [fb]   S 95 Obs   S 95 Exp           CLb(S 95 Obs)   p(s=0) (Z)  
 ────────────────────────────────────────────────────────────────────────────────────────────────────────── 
  DRInt_cuts        36 ± 4     38     0.14        20.2       17.0 + 6.3 - 2.0   0.70            0.24 (0.7)  
 ────────────────────────────────────────────────────────────────────────────────────────────────────────── 
Table 20: Model-independent upper limits on the observed visible cross-section in the five electroweak search discovery regions, derived using pseudo-experiments. Left to right: background-only model post-fit total expected background, with the combined statistical and systematic uncertainties; observed data; 95 CL upper limits on the visible cross-section ($\langle A\varepsilon\sigma\rangle_\mathrm{obs}^{95}$) and on the number of signal events ($\mathrm{S}_\mathrm{obs}^{95}$). The sixth column ($\mathrm{S}_\mathrm{exp}^{95}$) shows the expected 95 CL upper limit on the number of signal events, given the expected number (and $\pm1\sigma$ excursions of the expectation) of background events. The last two columns indicate the confidence level of the background-only hypothesis ($\mathrm{CL}_\mathrm{b}$) and discovery $p$-value with the corresponding Gaussian significance ($Z(s=0)$). $\mathrm{CL}_\mathrm{b}$ provides a measure of compatibility of the observed data with the signal strength hypothesis at the 95 CL limit relative to fluctuations of the background, and $p(s=0)$ measures compatibility of the observed data with the background-only hypothesis relative to fluctuations of the background. The $p$-value is capped at 0.5.

Acknowledgements#

We acknowledge the help of ATLAS Statistics Forum including:

  • Will Buttinger

  • Jonathan Long

  • Zach Marshall

  • Alex Read

  • Sarah Williams