HistFactory

%%html
<p align="center">
<iframe src="https://matthewfeickert.github.io/talk-SciPy-2020/index.html#2" width="1200" height="675"></iframe>
</p>

%%html
<p align="center">
<iframe src="https://matthewfeickert.github.io/talk-SciPy-2020/index.html#6" width="1200" height="675"></iframe>
</p>

%%html
<p align="center">
<iframe src="https://matthewfeickert.github.io/talk-SciPy-2020/index.html#7" width="1200" height="675"></iframe>
</p>

%%html
<p align="center">
<iframe src="https://matthewfeickert.github.io/talk-SciPy-2020/index.html#13" width="1200" height="675"></iframe>
</p>

JSON Spec

Let’s investiage this spec a bit more now

{
    "channels": [
        { "name": "singlechannel",
          "samples": [
            { "name": "signal",
              "data": [5.0, 10.0],
              "modifiers": [ { "name": "mu", "type": "normfactor", "data": null} ]
            },
            { "name": "background",
              "data": [50.0, 60.0],
              "modifiers": [ {"name": "uncorr_bkguncrt", "type": "shapesys", "data": [5.0, 12.0]} ]
            }
          ]
        }
    ],
    "observations": [
        { "name": "singlechannel", "data": [50.0, 60.0] }
    ],
    "measurements": [
        { "name": "Measurement", "config": {"poi": "mu", "parameters": []} }
    ],
    "version": "1.0.0"
}

which demonstrates a simple measurement of a single two-bin channel with two samples: a signal sample and a background sample. The signal sample has an unconstrained normalisation factor \(\mu\), while the background sample carries an uncorrelated shape systematic controlled by parameters \(\gamma_{1}\) and \(\gamma_{2}\). The background uncertainty for the bins is \(10\%\) and \(20\%\) respectively.

import pyhf
import json

Let’s open up the spec and then load it into a workspace

with open("data/2-bin_1-channel.json") as serialized:
  spec = json.load(serialized)

workspace = pyhf.Workspace(spec)

and then create a statistical model from it

model = workspace.model(measurement_name="Measurement")
# Take a quick look at the model spec
model.spec
{'channels': [{'name': 'singlechannel',
   'samples': [{'name': 'signal',
     'data': [5.0, 10.0],
     'modifiers': [{'name': 'mu', 'type': 'normfactor', 'data': None}]},
    {'name': 'background',
     'data': [50.0, 60.0],
     'modifiers': [{'name': 'uncorr_bkguncrt',
       'type': 'shapesys',
       'data': [5.0, 12.0]}]}]}],
 'parameters': []}

Let’s clean that up a bit to make it more readable

def pretty_json(jsonlike, indent=None):
    if indent is None: indent = 4
    print(json.dumps(jsonlike, indent=indent))
pretty_json(model.spec)
{
    "channels": [
        {
            "name": "singlechannel",
            "samples": [
                {
                    "name": "signal",
                    "data": [
                        5.0,
                        10.0
                    ],
                    "modifiers": [
                        {
                            "name": "mu",
                            "type": "normfactor",
                            "data": null
                        }
                    ]
                },
                {
                    "name": "background",
                    "data": [
                        50.0,
                        60.0
                    ],
                    "modifiers": [
                        {
                            "name": "uncorr_bkguncrt",
                            "type": "shapesys",
                            "data": [
                                5.0,
                                12.0
                            ]
                        }
                    ]
                }
            ]
        }
    ],
    "parameters": []
}

and now actually break that down again.

Single Channel

demonstrates a simple measurement of a single two-bin channel

print(f"Channels in model: {model.config.channels}\n")

single_channel = model.spec["channels"][0]

print(f"Number of bins in channel: {model.config.channel_nbins}\n")
Channels in model: ['singlechannel']

Number of bins in channel: {'singlechannel': 2}
single_channel
{'name': 'singlechannel',
 'samples': [{'name': 'signal',
   'data': [5.0, 10.0],
   'modifiers': [{'name': 'mu', 'type': 'normfactor', 'data': None}]},
  {'name': 'background',
   'data': [50.0, 60.0],
   'modifiers': [{'name': 'uncorr_bkguncrt',
     'type': 'shapesys',
     'data': [5.0, 12.0]}]}]}

Samples

with two samples

model.config.samples
['background', 'signal']
pretty_json(single_channel["samples"])
[
    {
        "name": "signal",
        "data": [
            5.0,
            10.0
        ],
        "modifiers": [
            {
                "name": "mu",
                "type": "normfactor",
                "data": null
            }
        ]
    },
    {
        "name": "background",
        "data": [
            50.0,
            60.0
        ],
        "modifiers": [
            {
                "name": "uncorr_bkguncrt",
                "type": "shapesys",
                "data": [
                    5.0,
                    12.0
                ]
            }
        ]
    }
]

Modifiers

model.config.modifiers
[('mu', 'normfactor'), ('uncorr_bkguncrt', 'shapesys')]

The signal sample has an unconstrained normalisation factor \(\mu\)

signal_sample = single_channel["samples"][0]
pretty_json(signal_sample["modifiers"])
[
    {
        "name": "mu",
        "type": "normfactor",
        "data": null
    }
]

the background sample carries an uncorrelated shape systematic controlled by parameters \(\gamma_{1}\) and \(\gamma_{2}\)

# Each bin has its own shape systematic
background_sample = single_channel["samples"][1]
pretty_json(background_sample["modifiers"])
[
    {
        "name": "uncorr_bkguncrt",
        "type": "shapesys",
        "data": [
            5.0,
            12.0
        ]
    }
]

The background uncertainty for the bins is \(10\%\) and \(20\%\) respectively.

import numpy as np

bkg_uncert = background_sample["modifiers"][0]["data"]
np.array(bkg_uncert)/np.array(background_sample["data"])
array([0.1, 0.2])
import requests

remote_url = "https://raw.githubusercontent.com/scikit-hep/pyhf/0f99cc488156e0826a27f55abc946d537a8922af/docs/examples/json/2-bin_1-channel.json"
response = json.loads(requests.get(remote_url).text)

pretty_json(response)
{
    "channels": [
        {
            "name": "singlechannel",
            "samples": [
                {
                    "name": "signal",
                    "data": [
                        5.0,
                        10.0
                    ],
                    "modifiers": [
                        {
                            "name": "mu",
                            "type": "normfactor",
                            "data": null
                        }
                    ]
                },
                {
                    "name": "background",
                    "data": [
                        50.0,
                        60.0
                    ],
                    "modifiers": [
                        {
                            "name": "uncorr_bkguncrt",
                            "type": "shapesys",
                            "data": [
                                5.0,
                                12.0
                            ]
                        }
                    ]
                }
            ]
        }
    ],
    "observations": [
        {
            "name": "singlechannel",
            "data": [
                50.0,
                60.0
            ]
        }
    ],
    "measurements": [
        {
            "name": "Measurement",
            "config": {
                "poi": "mu",
                "parameters": []
            }
        }
    ],
    "version": "1.0.0"
}

Observations

The data assocaited with a workspace

workspace.data(model)
[50.0, 60.0, 100.0, 25.0]

contains both observations

workspace.data(model, with_aux=False)
[50.0, 60.0]

as well as axuiliary information

model.config.auxdata
[100.0, 25.0]

Measurements

The parameter(s) of interest (POI) you’re trying to measure

pretty_json(workspace.get_measurement())
{
    "name": "Measurement",
    "config": {
        "poi": "mu",
        "parameters": []
    }
}

Inspecting workspaces

We can also use the pyhf command line tool to help us verify and inspect the workspace

! pyhf inspect data/2-bin_1-channel.json
          Summary       
    ------------------  
       channels  1
        samples  2
     parameters  2
      modifiers  2

       channels  nbins
     ----------  -----
  singlechannel    2  

        samples
     ----------
     background
         signal

     parameters  constraint              modifiers
     ----------  ----------              ----------
             mu  unconstrained           normfactor
uncorr_bkguncrt  constrained_by_poisson  shapesys

    measurement           poi            parameters
     ----------        ----------        ----------
(*) Measurement            mu            (none)