Workspace Manipulations#

In this chapter, you will learn about various workspace manipulations including how to convert from HistFactory XML+ROOT workspaces to pyhf. We’ll cover some common pitfalls such as locations of root files, and being able to set the base path for the conversion.

Getting the XML+ROOT#

Note, getting the XML+ROOT won’t necessarily be covered as part of the tutorial as it requires ROOT (though ROOT is installed in the Binder instance).

If you want to practice extracting out the HistFactory files from the workspace, first create the workspace like so:

# Need to be in the directory containing config directory
from os import chdir
from pathlib import Path

_top_level_dir = Path.cwd()
chdir(_top_level_dir.joinpath("data", "multichannel_histfactory"))
! hist2workspace config/example.xml
[#2] INFO:HistFactory -- hist2workspace is less verbose now. Use -v and -vv for more details.

[#2] PROGRESS:HistFactory -- Getting histogram ./data/data.root:/signal_data
[#2] PROGRESS:HistFactory -- Getting histogram ./data/data.root:/signal_signal
[#2] PROGRESS:HistFactory -- Getting histogram ./data/data.root:/signal_bkg
[#2] PROGRESS:HistFactory -- Getting histogram ./data/data.root:/signal_bkgerr
[#2] PROGRESS:HistFactory -- Getting histogram ./data/data.root:/control_data
[#2] PROGRESS:HistFactory -- Getting histogram ./data/data.root:/control_bkg
[#2] PROGRESS:HistFactory -- Getting histogram ./data/data.root:/control_bkgerr
[#2] PROGRESS:HistFactory -- Starting to process channel: channel1
[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	Starting to process 'channel1' channel with 1 observables
-----------------------------------------
[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	import model into workspace
-----------------------------------------
[#2] PROGRESS:HistFactory -- Writing sample: signal
[#2] PROGRESS:HistFactory -- Writing sample: bkg
[#2] PROGRESS:HistFactory -- Saved all histograms
[#2] PROGRESS:HistFactory -- Saved Measurement
[#2] PROGRESS:HistFactory -- Successfully wrote channel to file
[#2] PROGRESS:HistFactory -- Starting to process channel: channel2
[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	Starting to process 'channel2' channel with 1 observables
-----------------------------------------

[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	import model into workspace
-----------------------------------------

WARNING: Can't find parameter of interest: SigXsecOverSM in Workspace. Not setting in ModelConfig.
[#2] PROGRESS:HistFactory -- Writing sample: bkg
[#2] PROGRESS:HistFactory -- Saved all histograms
[#2] PROGRESS:HistFactory -- Saved Measurement
[#2] PROGRESS:HistFactory -- Successfully wrote channel to file
[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	Entering combination
-----------------------------------------

[#2] PROGRESS:HistFactory -- Merging data for channel channel1
[#2] PROGRESS:HistFactory -- Merging data for channel channel2
[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	Importing combined model
-----------------------------------------

[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	create toy data for channelCat[channel1,channel2]
-----------------------------------------

[#2] PROGRESS:HistFactory -- Writing combined workspace to file: ./results/example_combined_GaussExample_model.root
[#2] PROGRESS:HistFactory -- Writing combined measurement to file: ./results/example_combined_GaussExample_model.root
[#2] PROGRESS:HistFactory -- Writing sample: signal
[#2] PROGRESS:HistFactory -- Writing sample: bkg
[#2] PROGRESS:HistFactory -- Writing sample: bkg
[#2] PROGRESS:HistFactory -- Saved all histograms
[#2] PROGRESS:HistFactory -- Saved Measurement

and you’ll notice a few new files being made!

$ ls -lhF results/
total 136K
-rw-r--r-- 1 jovyan jovyan 40K Nov  8 21:01 example_channel1_GaussExample_model.root
-rw-r--r-- 1 jovyan jovyan 38K Nov  8 21:01 example_channel2_GaussExample_model.root
-rw-r--r-- 1 jovyan jovyan 47K Nov  8 21:01 example_combined_GaussExample_model.root
-rw-r--r-- 1 jovyan jovyan 503 Nov  8 21:01 example_GaussExample.root
-rw-r--r-- 1 jovyan jovyan  26 Nov  8 21:01 example_results.table
! ls -lhF results/
total 116K
-rw-r--r-- 1 root root 503 Jul  4 10:36 example_GaussExample.root
-rw-r--r-- 1 root root 38K Jul  4 10:36 example_channel1_GaussExample_model.root
-rw-r--r-- 1 root root 22K Jul  4 10:36 example_channel2_GaussExample_model.root
-rw-r--r-- 1 root root 44K Jul  4 10:36 example_combined_GaussExample_model.root
-rw-r--r-- 1 root root  26 Jul  4 10:36 example_results.table

In particular, example_combined_GaussExample_model.root is the file that contains the RooStats::HistFactory::Measurement object:

$ root results/example_combined_GaussExample_model.root 
   ------------------------------------------------------------
  | Welcome to ROOT 6.18/04                  https://root.cern |
  |                               (c) 1995-2019, The ROOT Team |
  | Built for macosx64 on Sep 11 2019, 15:38:23                |
  | From tags/v6-18-04@v6-18-04                                |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------

root [0] 
Attaching file results/example_combined_GaussExample_model.root as _file0...

RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

(TFile *) 0x7ffaa30d2130
root [1] .ls
TFile**		results/example_combined_GaussExample_model.root	
 TFile*		results/example_combined_GaussExample_model.root	
  KEY: RooWorkspace	combined;1	combined
  KEY: TProcessID	ProcessID0;1	e1e9272e-fddb-11ea-86b3-1556a8c0beef
  KEY: TDirectoryFile	channel1_hists;1	channel1_hists
  KEY: TDirectoryFile	channel2_hists;1	channel2_hists
  KEY: RooStats::HistFactory::Measurement	GaussExample;1

from which you can extract out the necessary XML files as well:

root [2] GaussExample->PrintXML()
Printing XML Files for measurement: GaussExample
Printing XML Files for channel: channel1
Finished printing XML files
Printing XML Files for channel: channel2
Finished printing XML files
Finished printing XML files

To do this programatically, you can either write a ROOT macro

// printXML.C
int printXML() {
    TFile* _file0 = TFile::Open("results/example_combined_GaussExample_model.root");
    _file0->Get<RooStats::HistFactory::Measurement>("GaussExample")->PrintXML();

    return 0;
}

and run it

$ root -l -b -q printXML.C

but we can also do the same with PyROOT in as many lines

import ROOT

_file0 = ROOT.TFile.Open("results/example_combined_GaussExample_model.root")
_file0.GaussExample.PrintXML()
Welcome to JupyROOT 6.28/04
[#2] PROGRESS:HistFactory -- Printing XML Files for measurement: GaussExample
[#2] PROGRESS:HistFactory -- Printing XML Files for channel: channel1
[#2] PROGRESS:HistFactory -- Finished printing XML files
[#2] PROGRESS:HistFactory -- Printing XML Files for channel: channel2
[#2] PROGRESS:HistFactory -- Finished printing XML files
[#2] PROGRESS:HistFactory -- Finished printing XML files

which dumps them into the same directory you ran from:

$ ls -lhF
total 24K
drwxr-xr-x 2 jovyan jovyan 4.0K Nov  8 19:52 config/
drwxr-xr-x 2 jovyan jovyan 4.0K Nov  8 19:52 data/
-rw-r--r-- 1 jovyan jovyan 1.1K Nov  8 21:01 GaussExample_channel1.xml
-rw-r--r-- 1 jovyan jovyan  794 Nov  8 21:01 GaussExample_channel2.xml
-rw-r--r-- 1 jovyan jovyan  459 Nov  8 21:01 GaussExample.xml
drwxr-xr-x 2 jovyan jovyan 4.0K Nov  8 21:01 results/
! ls -lhF
total 24K
-rw-r--r-- 1 root root  458 Jul  4 10:36 GaussExample.xml
-rw-r--r-- 1 root root 1.1K Jul  4 10:36 GaussExample_channel1.xml
-rw-r--r-- 1 root root  793 Jul  4 10:36 GaussExample_channel2.xml
drwxr-xr-x 2 root root 4.0K Jul  4 10:21 config/
drwxr-xr-x 2 root root 4.0K Jul  4 10:21 data/
drwxr-xr-x 2 root root 4.0K Jul  4 10:36 results/
chdir(_top_level_dir)

XML to JSON#

via the command line#

So pyhf comes with a lot of nifty utilities you can access. The documentation for the command line can be found via pyhf --help or online.

! pyhf --help
Usage: pyhf [OPTIONS] COMMAND [ARGS]...

  Top-level CLI entrypoint.

Options:
  --version           Show the version and exit.
  --cite, --citation  Print the bibtex citation for this software
  -h, --help          Show this message and exit.

Commands:
  cls          Compute CLs value(s) for a given pyhf workspace.
  combine      Combine two workspaces into a single workspace.
  completions  Generate shell completion code.
  contrib      Contrib experimental operations.
  digest       Use hashing algorithm to calculate the workspace digest.
  fit          Perform a maximum likelihood fit for a given pyhf workspace.
  inspect      Inspect a pyhf JSON document.
  json2xml     Convert pyhf JSON back to XML + ROOT files.
  patchset     Operations involving patchsets.
  prune        Prune components from the workspace.
  rename       Rename components of the workspace.
  sort         Sort the workspace.
  xml2json     Entrypoint XML: The top-level XML file for the PDF...

Let’s focus for now on pyhf xml2json which requires that you have installed pyhf[xmlio] (pyhf with the xmlio option).

python -m pip install pyhf[xmlio]

Again, the online documentation for this option is found here.

! pyhf xml2json --help
Usage: pyhf xml2json [OPTIONS] ENTRYPOINT_XML

  Entrypoint XML: The top-level XML file for the PDF definition.

Options:
  --basedir PATH                  The base directory for the XML files to
                                  point relative to.
  -v, --mount PATH:PATH           Consists of two fields, separated by a colon
                                  character ( : ). The first field is the
                                  local path to where files are located, the
                                  second field is the path where the file or
                                  directory are saved in the XML
                                  configuration. This is similar in spirit to
                                  Docker.
  --output-file TEXT              The location of the output json file. If not
                                  specified, prints to screen.
  --track-progress / --hide-progress
  --validation-as-error / --validation-as-warning
  -h, --help                      Show this message and exit.

Let’s remind ourselves of what the top-level XML file looks like, as this is the ENTRYPOINT_XML.

! tail -n +15 data/multichannel_histfactory/config/example.xml | cat -n
     1	<!DOCTYPE Combination  SYSTEM 'HistFactorySchema.dtd'>
     2	
     3	<Combination OutputFilePrefix="./results/example">
     4	  <Input>./config/example_signal.xml</Input>
     5	  <Input>./config/example_control.xml</Input>
     6	  <Measurement Name="GaussExample" Lumi="1." LumiRelErr="0.1" ExportOnly="True">
     7	    <POI>SigXsecOverSM</POI>
     8	    <ParamSetting Const="True">Lumi</ParamSetting>
     9	  </Measurement>
    10	</Combination>

So to explain these options:

  • basedir specifies the base directory for where all the XML files are reference with respect to. As you can see from lines 3, 4, 5 - this should be the directory containing results/ and config/

  • output-file specifies the output JSON file. If one is not specified, this will print to the screen, which you can redirect into a file if you want (pyhf xml2json ... > workspace.json)

  • hide-progress will disable showing the progress bars when running the script… but we like progress bars 🙂

Let’s go ahead and run this command, but we won’t specify the output file so it goes to the screen. We’ll also disable the progress tracking, just so we have a nicer output for this tutorial.

! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | cat -n
     1	{
     2	    "channels": [
     3	        {
     4	            "name": "channel1",
     5	            "samples": [
     6	                {
     7	                    "data": [
     8	                        10.0,
     9	                        35.0
    10	                    ],
    11	                    "modifiers": [
    12	                        {
    13	                            "data": null,
    14	                            "name": "SigXsecOverSM",
    15	                            "type": "normfactor"
    16	                        }
    17	                    ],
    18	                    "name": "signal"
    19	                },
    20	                {
    21	                    "data": [
    22	                        100.0,
    23	                        150.0
    24	                    ],
    25	                    "modifiers": [
    26	                        {
    27	                            "data": null,
    28	                            "name": "lumi",
    29	                            "type": "lumi"
    30	                        },
    31	                        {
    32	                            "data": [
    33	                                10.000000149011612,
    34	                                10.000000521540642
    35	                            ],
    36	                            "name": "uncorrshape_signal",
    37	                            "type": "shapesys"
    38	                        }
    39	                    ],
    40	                    "name": "bkg"
    41	                }
    42	            ]
    43	        },
    44	        {
    45	            "name": "channel2",
    46	            "samples": [
    47	                {
    48	                    "data": [
    49	                        200.0,
    50	                        350.0
    51	                    ],
    52	                    "modifiers": [
    53	                        {
    54	                            "data": null,
    55	                            "name": "lumi",
    56	                            "type": "lumi"
    57	                        },
    58	                        {
    59	                            "data": [
    60	                                5.000000074505806,
    61	                                10.000000055879354
    62	                            ],
    63	                            "name": "uncorrshape_control",
    64	                            "type": "shapesys"
    65	                        }
    66	                    ],
    67	                    "name": "bkg"
    68	                }
    69	            ]
    70	        }
    71	    ],
    72	    "measurements": [
    73	        {
    74	            "config": {
    75	                "parameters": [
    76	                    {
    77	                        "auxdata": [
    78	                            1.0
    79	                        ],
    80	                        "bounds": [
    81	                            [
    82	                                0.5,
    83	                                1.5
    84	                            ]
    85	                        ],
    86	                        "fixed": true,
    87	                        "inits": [
    88	                            1.0
    89	                        ],
    90	                        "name": "lumi",
    91	                        "sigmas": [
    92	                            0.1
    93	                        ]
    94	                    },
    95	                    {
    96	                        "bounds": [
    97	                            [
    98	                                0.0,
    99	                                10.0
   100	                            ]
   101	                        ],
   102	                        "inits": [
   103	                            1.0
   104	                        ],
   105	                        "name": "SigXsecOverSM"
   106	                    }
   107	                ],
   108	                "poi": "SigXsecOverSM"
   109	            },
   110	            "name": "GaussExample"
   111	        }
   112	    ],
   113	    "observations": [
   114	        {
   115	            "data": [
   116	                110.0,
   117	                155.0
   118	            ],
   119	            "name": "channel1"
   120	        },
   121	        {
   122	            "data": [
   123	                205.0,
   124	                345.0
   125	            ],
   126	            "name": "channel2"
   127	        }
   128	    ],
   129	    "version": "1.0.0"
   130	}

Only 130 lines for the entire workspace! Not too shabby. If we look through a couple of pieces:

  • line 2: specify a list of channels

  • line 5: specify the samples for channel1

  • lines 6-10: specify the expected event rate for the signal sample in channel1

  • line 11: specify a list of modifiers (e.g. parameters that modify the sample)

Similarly, if we continue down to the second half of this JSON, we hit line 72 which specifies a list of measurements for this workspace. In fact, we only have one measurement called GaussExample with the parameter of interest defined as SigXsecOverSM. This measurement also specifies additional parameter configuration such as details for the luminosity modifier (parameter name lumi).

Nearly at the end, the next part of this specification is for the observations (observed data) on line 113. Each observation corresponds with the channel, where channel1 has two bins, and channel2 also has two bins.

Finally, we have a version which specifies the version of the schema used for the JSON HistFactory. In this case, we’re using 1.0.0 which has the https://pyhf.readthedocs.io/en/v0.7.5/schemas/1.0.0/workspace.json definition which refers to the https://pyhf.readthedocs.io/en/v0.7.5/schemas/1.0.0/defs.json.

What’s really nice about the schema definition is that it allows anyone to write their own tooling/scripting to build up the workspace and quickly check if it matches the schema. This will get you 90% of the way there in having a valid workspace to work with.

There are some additional checks that cannot be done, such as name conflicts, or ensuring that all samples in a channel have the same binning structure. The good news is that these checks can be done simply by loading up the workspace into a pyhf.Workspace object which will do the schema validation, as well as the additional checks.

Speaking of pyhf.Workspace objects…

via the python interface#

Let’s do the exact same thing, but from the python interpreter

import pyhf
import pyhf.readxml  # not imported by default!
spec = pyhf.readxml.parse(
    "data/multichannel_histfactory/config/example.xml", "data/multichannel_histfactory"
)

So we’re not going to dump this out. We already did that above. Let’s just quickly go ahead and load it into a pyhf.Workspace object because we can.

ws = pyhf.Workspace(spec)
print(f"    channels: {ws.channels}")
print(f"       nbins: {ws.channel_nbins}")
print(f"     samples: {ws.samples}")
print(f"   modifiers: {ws.modifiers}")
print(f"observations: {ws.observations}")
    channels: ['channel1', 'channel2']
       nbins: {'channel1': 2, 'channel2': 2}
     samples: ['bkg', 'signal']
   modifiers: [('SigXsecOverSM', 'normfactor'), ('lumi', 'lumi'), ('uncorrshape_control', 'shapesys'), ('uncorrshape_signal', 'shapesys')]
observations: {'channel1': [110.0, 155.0], 'channel2': [205.0, 345.0]}

Already, we’re seeing a lot of information about this workspace as it’s rather inspectable. Remember, this is not a model. What we call a ‘model’ is to combine the channel specification with a measurement… that is, a measurement of a workspace uniquely defines that model. A model might choose a particular parameter of interest to measure or set specific parameters as constant during the fit. These configurations are all stored in the measurements key we saw above. We’ll explore more about models in the next chapter.

Let’s move on to more things we can do with the command line.

Workspace Inspection#

Now that we have a working command for converting our XML to JSON, let’s go ahead and take advantage of the JSON output by piping it to pyhf inspect which will print out a nice summary of our workspace.

! pyhf inspect --help
Usage: pyhf inspect [OPTIONS] [WORKSPACE]

  Inspect a pyhf JSON document.

  Example:

  .. code-block:: shell

      $ curl -sL https://raw.githubusercontent.com/scikit-
      hep/pyhf/main/docs/examples/json/2-bin_1-channel.json | pyhf inspect
      Summary         ------------------            channels  1
      samples  2          parameters  2           modifiers  2

             channels  nbins          ----------  -----       singlechannel
             2

              samples          ----------          background
              signal

           parameters  constraint              modifiers          ----------
           ----------              ----------                  mu
           unconstrained           normfactor     uncorr_bkguncrt
           constrained_by_poisson  shapesys

          measurement           poi            parameters          ----------
          ----------        ----------     (*) Measurement            mu
          (none)

Options:
  --output-file TEXT  The location of the output json file. If not specified,
                      prints to screen.
  --measurement TEXT
  -h, --help          Show this message and exit.
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf inspect
              Summary       
        ------------------  
           channels  2
            samples  2
         parameters  4
          modifiers  4

           channels  nbins
         ----------  -----
           channel1    2  
           channel2    2  

            samples
         ----------
                bkg
             signal

         parameters  constraint              modifiers
         ----------  ----------              ----------
      SigXsecOverSM  unconstrained           normfactor
               lumi  constrained_by_normal   lumi
uncorrshape_control  constrained_by_poisson  shapesys
 uncorrshape_signal  constrained_by_poisson  shapesys

        measurement           poi            parameters
         ----------        ----------        ----------
   (*) GaussExample      SigXsecOverSM       lumi,SigXsecOverSM

Immediately, we get a lot of useful information. We can see the number of channels, samples, parameters, and modifiers. Then we get a breakdown of the channels (and the number of bins for each channel), the samples, and the parameters. Finally, we see a list of measurements defined in the workspace, as well as the (*) denoting the default measurement if one is not specified.

Could the number of parameters and modifiers differ?

“Normalizing” a Workspace#

There comes a time when you need to make comparisons to determine changes between two workspaces. This means depending on how the workspace is generated, one might need to “sort” it. pyhf sort is a utility that will normalize the workspace for you, such that certain operations like calculating a checksum (pyhf digest) guarantees unitarity.

For simple workspaces like the ones we’re using in this tutorial, they’re already sorted… however, this is not true in the real world. Notice how the bkg is now the first sample and signal is the second sample after sorting.

! pyhf sort --help
Usage: pyhf sort [OPTIONS] [WORKSPACE]

  Sort the workspace.

  See :func:`pyhf.workspace.Workspace.sorted` for more information.

  Example:

  .. code-block:: shell

      $ curl -sL https://raw.githubusercontent.com/scikit-
      hep/pyhf/main/docs/examples/json/2-bin_1-channel.json | pyhf sort | jq
      '.' | md5     8be5186ec249d2704e14dd29ef05ffb0

  .. code-block:: shell

      $ curl -sL https://raw.githubusercontent.com/scikit-
      hep/pyhf/main/docs/examples/json/2-bin_1-channel.json | jq -S '.channels
      |=sort_by(.name)|.channels[].samples|=sort_by(.name)|.channels[].samples
      [].modifiers|=sort_by(.name,.type)|.observations|=sort_by(.name)' | md5
      8be5186ec249d2704e14dd29ef05ffb0

Options:
  --output-file TEXT  The location of the output json file. If not specified,
                      prints to screen.
  -h, --help          Show this message and exit.
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf sort
{
    "channels": [
        {
            "name": "channel1",
            "samples": [
                {
                    "data": [
                        100.0,
                        150.0
                    ],
                    "modifiers": [
                        {
                            "data": null,
                            "name": "lumi",
                            "type": "lumi"
                        },
                        {
                            "data": [
                                10.000000149011612,
                                10.000000521540642
                            ],
                            "name": "uncorrshape_signal",
                            "type": "shapesys"
                        }
                    ],
                    "name": "bkg"
                },
                {
                    "data": [
                        10.0,
                        35.0
                    ],
                    "modifiers": [
                        {
                            "data": null,
                            "name": "SigXsecOverSM",
                            "type": "normfactor"
                        }
                    ],
                    "name": "signal"
                }
            ]
        },
        {
            "name": "channel2",
            "samples": [
                {
                    "data": [
                        200.0,
                        350.0
                    ],
                    "modifiers": [
                        {
                            "data": null,
                            "name": "lumi",
                            "type": "lumi"
                        },
                        {
                            "data": [
                                5.000000074505806,
                                10.000000055879354
                            ],
                            "name": "uncorrshape_control",
                            "type": "shapesys"
                        }
                    ],
                    "name": "bkg"
                }
            ]
        }
    ],
    "measurements": [
        {
            "config": {
                "parameters": [
                    {
                        "bounds": [
                            [
                                0.0,
                                10.0
                            ]
                        ],
                        "inits": [
                            1.0
                        ],
                        "name": "SigXsecOverSM"
                    },
                    {
                        "auxdata": [
                            1.0
                        ],
                        "bounds": [
                            [
                                0.5,
                                1.5
                            ]
                        ],
                        "fixed": true,
                        "inits": [
                            1.0
                        ],
                        "name": "lumi",
                        "sigmas": [
                            0.1
                        ]
                    }
                ],
                "poi": "SigXsecOverSM"
            },
            "name": "GaussExample"
        }
    ],
    "observations": [
        {
            "data": [
                110.0,
                155.0
            ],
            "name": "channel1"
        },
        {
            "data": [
                205.0,
                345.0
            ],
            "name": "channel2"
        }
    ],
    "version": "1.0.0"
}

Computing a digest#

Next up is a way to determine if two workspaces are equivalent, simply by comparing their computed digest. Note that this is based on the contents of the workspace and will not ensure floating-point differences are treated identically. That is, 2.19999999 and 2.2000001 will likely be treated as differently in the digest calculation as in python. We’ll show here why sorting is very important.

! pyhf digest --help
Usage: pyhf digest [OPTIONS] [WORKSPACE]

  Use hashing algorithm to calculate the workspace digest.

  Returns:     digests (:obj:`dict`): A mapping of the hashing algorithms used
  to the computed digest for the workspace.

  Example:

  .. code-block:: shell

      $ curl -sL https://raw.githubusercontent.com/scikit-
      hep/pyhf/main/docs/examples/json/2-bin_1-channel.json | pyhf digest
      sha256:dad8822af55205d60152cbe4303929042dbd9d4839012e055e7c6b6459d68d73

Options:
  -a, --algorithm TEXT          The hashing algorithm used to compute the
                                workspace digest.
  -j, --json / -p, --plaintext  Output the hash values as a JSON dictionary or
                                plaintext strings
  -h, --help                    Show this message and exit.
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf digest
sha256:50165e8ef034c514fb77e8f05a15a002c02bd659f001657952b79e0552470f79
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf sort | \
  pyhf digest
sha256:27a35f6874cf91f9b38916cf948ac18ee650f1b578a93107b9b212c8752b1310

Remember that the ordering of the samples will have switched through the sorting.

The sha256 algorithm is used to compute the checksum for this workspace. This means that one can generally “normalize” all workspaces, then compute the digest and guarantee uniqueness. As with all command line functionality you’ve seen so far, there are equivalent ways to do it through python.

print(f"Unsorted: {pyhf.utils.digest(ws)}")
print(f"Sorted:   {pyhf.utils.digest(pyhf.Workspace.sorted(ws))}")
Unsorted: 50165e8ef034c514fb77e8f05a15a002c02bd659f001657952b79e0552470f79
Sorted:   27a35f6874cf91f9b38916cf948ac18ee650f1b578a93107b9b212c8752b1310

“Pruning” away items#

Sometimes you want to manipulate workspaces by removing channels or samples or systematics (or measurements). This can be useful when trying to debug fits, or to build background-only workspaces, or to clean up a workspace.

! pyhf prune --help
Usage: pyhf prune [OPTIONS] [WORKSPACE]

  Prune components from the workspace.

  See :func:`pyhf.workspace.Workspace.prune` for more information.

Options:
  --output-file TEXT              The location of the output json file. If not
                                  specified, prints to screen.
  -c, --channel <CHANNEL>...
  -s, --sample <SAMPLE>...
  -m, --modifier <MODIFIER>...
  -t, --modifier-type [histosys|lumi|normfactor|normsys|shapefactor|shapesys|staterror]
  --measurement <MEASUREMENT>...
  -h, --help                      Show this message and exit.

prune channels#

! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf prune -c channel1 | \
  pyhf inspect
Traceback (most recent call last):
  File "/usr/local/venv/bin/pyhf", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/cli/spec.py", line 82, in inspect
    model = ws.model()
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/workspace.py", line 447, in model
    return Model(modelspec, **config_kwargs)
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py", line 780, in __init__
    self.config.set_poi(poi_name)
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py", line 464, in set_poi
    raise exceptions.InvalidModel(
pyhf.exceptions.InvalidModel: The parameter of interest 'SigXsecOverSM' cannot be fit as it is not declared in the model specification.

prune samples#

! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf prune -s signal | \
  pyhf inspect
Traceback (most recent call last):
  File "/usr/local/venv/bin/pyhf", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/cli/spec.py", line 82, in inspect
    model = ws.model()
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/workspace.py", line 447, in model
    return Model(modelspec, **config_kwargs)
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py", line 780, in __init__
    self.config.set_poi(poi_name)
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py", line 464, in set_poi
    raise exceptions.InvalidModel(
pyhf.exceptions.InvalidModel: The parameter of interest 'SigXsecOverSM' cannot be fit as it is not declared in the model specification.

prune modifiers#

! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf prune -m uncorrshape_signal | \
  pyhf inspect
              Summary       
        ------------------  
           channels  2
            samples  2
         parameters  3
          modifiers  3

           channels  nbins
         ----------  -----
           channel1    2  
           channel2    2  

            samples
         ----------
                bkg
             signal

         parameters  constraint              modifiers
         ----------  ----------              ----------
      SigXsecOverSM  unconstrained           normfactor
               lumi  constrained_by_normal   lumi
uncorrshape_control  constrained_by_poisson  shapesys

        measurement           poi            parameters
         ----------        ----------        ----------
   (*) GaussExample      SigXsecOverSM       lumi,SigXsecOverSM

prune modifier types#

! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf prune -t shapesys | \
  pyhf inspect
        Summary       
  ------------------  
     channels  2
      samples  2
   parameters  2
    modifiers  2

     channels  nbins
   ----------  -----
     channel1    2  
     channel2    2  

      samples
   ----------
          bkg
       signal

   parameters  constraint              modifiers
   ----------  ----------              ----------
SigXsecOverSM  unconstrained           normfactor
         lumi  constrained_by_normal   lumi

  measurement           poi            parameters
   ----------        ----------        ----------
(*) GaussExample      SigXsecOverSM       lumi,SigXsecOverSM

Renaming items#

In addition to removing items, you might want to rename your channels, samples, modifiers, or measurement names. This can be useful for creating modifier correlations, or removing modifier correlations, or just cleaning up your workspace to get it ready for publication.

! pyhf rename --help
Usage: pyhf rename [OPTIONS] [WORKSPACE]

  Rename components of the workspace.

  See :func:`pyhf.workspace.Workspace.rename` for more information.

Options:
  --output-file TEXT              The location of the output json file. If not
                                  specified, prints to screen.
  -c, --channel <PATTERN> <REPLACE>...
  -s, --sample <PATTERN> <REPLACE>...
  -m, --modifier <PATTERN> <REPLACE>...
  --measurement <PATTERN> <REPLACE>...
  -h, --help                      Show this message and exit.

rename channels#

! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf rename -c channel1 SR -c channel2 CR | \
  pyhf inspect
              Summary       
        ------------------  
           channels  2
            samples  2
         parameters  4
          modifiers  4

           channels  nbins
         ----------  -----
                 CR    2  
                 SR    2  

            samples
         ----------
                bkg
             signal

         parameters  constraint              modifiers
         ----------  ----------              ----------
      SigXsecOverSM  unconstrained           normfactor
               lumi  constrained_by_normal   lumi
uncorrshape_control  constrained_by_poisson  shapesys
 uncorrshape_signal  constrained_by_poisson  shapesys

        measurement           poi            parameters
         ----------        ----------        ----------
   (*) GaussExample      SigXsecOverSM       lumi,SigXsecOverSM

rename samples#

! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf rename -s bkg background | \
  pyhf inspect
              Summary       
        ------------------  
           channels  2
            samples  2
         parameters  4
          modifiers  4

           channels  nbins
         ----------  -----
           channel1    2  
           channel2    2  

            samples
         ----------
         background
             signal

         parameters  constraint              modifiers
         ----------  ----------              ----------
      SigXsecOverSM  unconstrained           normfactor
               lumi  constrained_by_normal   lumi
uncorrshape_control  constrained_by_poisson  shapesys
 uncorrshape_signal  constrained_by_poisson  shapesys

        measurement           poi            parameters
         ----------        ----------        ----------
   (*) GaussExample      SigXsecOverSM       lumi,SigXsecOverSM

rename modifiers#

! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf rename -m uncorrshape_signal corrshape -m uncorrshape_control corrshape | \
  pyhf inspect
Traceback (most recent call last):
  File "/usr/local/venv/bin/pyhf", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/cli/spec.py", line 82, in inspect
    model = ws.model()
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/workspace.py", line 447, in model
    return Model(modelspec, **config_kwargs)
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py", line 774, in __init__
    modifiers, _nominal_rates = _nominal_and_modifiers_from_spec(
  File "/usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py", line 133, in _nominal_and_modifiers_from_spec
    raise exceptions.InvalidModel(
pyhf.exceptions.InvalidModel: Trying to add paramset shapesys/corrshape on bkg sample in channel2 channel but other paramsets exist with the same name.

rename measurements#

! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
  pyhf rename --measurement GaussExample FitConfig | \
  pyhf inspect
              Summary       
        ------------------  
           channels  2
            samples  2
         parameters  4
          modifiers  4

           channels  nbins
         ----------  -----
           channel1    2  
           channel2    2  

            samples
         ----------
                bkg
             signal

         parameters  constraint              modifiers
         ----------  ----------              ----------
      SigXsecOverSM  unconstrained           normfactor
               lumi  constrained_by_normal   lumi
uncorrshape_control  constrained_by_poisson  shapesys
 uncorrshape_signal  constrained_by_poisson  shapesys

        measurement           poi            parameters
         ----------        ----------        ----------
      (*) FitConfig      SigXsecOverSM       lumi,SigXsecOverSM