In this chapter, you will learn about various workspace manipulations including how to convert from HistFactory XML+ROOT workspaces to pyhf. We’ll cover some common pitfalls such as locations of root files, and being able to set the base path for the conversion.
Getting the XML+ROOT¶
Note, getting the XML+ROOT won’t necessarily be covered as part of the tutorial as it requires ROOT (though ROOT is installed in the Binder instance).
If you want to practice extracting out the HistFactory files from the workspace, first create the workspace like so:
# Need to be in the directory containing config directory
from os import chdir
from pathlib import Path
_top_level_dir = Path.cwd()
chdir(_top_level_dir.joinpath("data", "multichannel_histfactory"))! hist2workspace config/example.xmland you’ll notice a few new files being made!
$ ls -lhF results/
total 136K
-rw-r--r-- 1 jovyan jovyan 40K Nov 8 21:01 example_channel1_GaussExample_model.root
-rw-r--r-- 1 jovyan jovyan 38K Nov 8 21:01 example_channel2_GaussExample_model.root
-rw-r--r-- 1 jovyan jovyan 47K Nov 8 21:01 example_combined_GaussExample_model.root
-rw-r--r-- 1 jovyan jovyan 503 Nov 8 21:01 example_GaussExample.root
-rw-r--r-- 1 jovyan jovyan 26 Nov 8 21:01 example_results.table! ls -lhF results/In particular, example_combined_GaussExample_model.root is the file that contains the RooStats::HistFactory::Measurement object:
$ root results/example_combined_GaussExample_model.root
------------------------------------------------------------
| Welcome to ROOT 6.18/04 https://root.cern |
| (c) 1995-2019, The ROOT Team |
| Built for macosx64 on Sep 11 2019, 15:38:23 |
| From tags/v6-18-04@v6-18-04 |
| Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
------------------------------------------------------------
root [0]
Attaching file results/example_combined_GaussExample_model.root as _file0...
RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby
Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
All rights reserved, please read http://roofit.sourceforge.net/license.txt
(TFile *) 0x7ffaa30d2130
root [1] .ls
TFile** results/example_combined_GaussExample_model.root
TFile* results/example_combined_GaussExample_model.root
KEY: RooWorkspace combined;1 combined
KEY: TProcessID ProcessID0;1 e1e9272e-fddb-11ea-86b3-1556a8c0beef
KEY: TDirectoryFile channel1_hists;1 channel1_hists
KEY: TDirectoryFile channel2_hists;1 channel2_hists
KEY: RooStats::HistFactory::Measurement GaussExample;1from which you can extract out the necessary XML files as well:
root [2] GaussExample->PrintXML()
Printing XML Files for measurement: GaussExample
Printing XML Files for channel: channel1
Finished printing XML files
Printing XML Files for channel: channel2
Finished printing XML files
Finished printing XML filesTo do this programatically, you can either write a ROOT macro
// printXML.C
int printXML() {
TFile* _file0 = TFile::Open("results/example_combined_GaussExample_model.root");
_file0->Get<RooStats::HistFactory::Measurement>("GaussExample")->PrintXML();
return 0;
}and run it
$ root -l -b -q printXML.Cbut we can also do the same with PyROOT in as many lines
import ROOT
_file0 = ROOT.TFile.Open("results/example_combined_GaussExample_model.root")
_file0.GaussExample.PrintXML()which dumps them into the same directory you ran from:
$ ls -lhF
total 24K
drwxr-xr-x 2 jovyan jovyan 4.0K Nov 8 19:52 config/
drwxr-xr-x 2 jovyan jovyan 4.0K Nov 8 19:52 data/
-rw-r--r-- 1 jovyan jovyan 1.1K Nov 8 21:01 GaussExample_channel1.xml
-rw-r--r-- 1 jovyan jovyan 794 Nov 8 21:01 GaussExample_channel2.xml
-rw-r--r-- 1 jovyan jovyan 459 Nov 8 21:01 GaussExample.xml
drwxr-xr-x 2 jovyan jovyan 4.0K Nov 8 21:01 results/! ls -lhFchdir(_top_level_dir)XML to JSON¶
via the command line¶
So pyhf comes with a lot of nifty utilities you can access. The documentation for the command line can be found via pyhf --help or online.
! pyhf --helpLet’s focus for now on pyhf xml2json which requires that you have installed pyhf[xmlio] (pyhf with the xmlio option).
python -m pip install pyhf[xmlio]Again, the online documentation for this option is found here.
! pyhf xml2json --helpLet’s remind ourselves of what the top-level XML file looks like, as this is the ENTRYPOINT_XML.
! tail -n +15 data/multichannel_histfactory/config/example.xml | cat -nSo to explain these options:
basedirspecifies the base directory for where all the XML files are reference with respect to. As you can see from lines 3, 4, 5 - this should be the directory containingresults/andconfig/output-filespecifies the output JSON file. If one is not specified, this will print to the screen, which you can redirect into a file if you want (pyhf xml2json ... > workspace.json)hide-progresswill disable showing the progress bars when running the script... but we like progress bars 🙂
Let’s go ahead and run this command, but we won’t specify the output file so it goes to the screen. We’ll also disable the progress tracking, just so we have a nicer output for this tutorial.
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | cat -nOnly 130 lines for the entire workspace! Not too shabby. If we look through a couple of pieces:
line 2: specify a list of channels
line 5: specify the samples for
channel1lines 6-10: specify the expected event rate for the
signalsample inchannel1line 11: specify a list of modifiers (e.g. parameters that modify the sample)
Similarly, if we continue down to the second half of this JSON, we hit line 72 which specifies a list of measurements for this workspace. In fact, we only have one measurement called GaussExample with the parameter of interest defined as SigXsecOverSM. This measurement also specifies additional parameter configuration such as details for the luminosity modifier (parameter name lumi).
Nearly at the end, the next part of this specification is for the observations (observed data) on line 113. Each observation corresponds with the channel, where channel1 has two bins, and channel2 also has two bins.
Finally, we have a version which specifies the version of the schema used for the JSON HistFactory. In this case, we’re using 1.0.0 which has the https://
What’s really nice about the schema definition is that it allows anyone to write their own tooling/scripting to build up the workspace and quickly check if it matches the schema. This will get you 90% of the way there in having a valid workspace to work with.
There are some additional checks that cannot be done, such as name conflicts, or ensuring that all samples in a channel have the same binning structure. The good news is that these checks can be done simply by loading up the workspace into a pyhf.Workspace object which will do the schema validation, as well as the additional checks.
Speaking of pyhf.Workspace objects...
via the python interface¶
Let’s do the exact same thing, but from the python interpreter
import pyhf
import pyhf.readxml # not imported by default!spec = pyhf.readxml.parse(
"data/multichannel_histfactory/config/example.xml", "data/multichannel_histfactory"
)So we’re not going to dump this out. We already did that above. Let’s just quickly go ahead and load it into a pyhf.Workspace object because we can.
ws = pyhf.Workspace(spec)
print(f" channels: {ws.channels}")
print(f" nbins: {ws.channel_nbins}")
print(f" samples: {ws.samples}")
print(f" modifiers: {ws.modifiers}")
print(f"observations: {ws.observations}")Already, we’re seeing a lot of information about this workspace as it’s rather inspectable. Remember, this is not a model. What we call a ‘model’ is to combine the channel specification with a measurement... that is, a measurement of a workspace uniquely defines that model. A model might choose a particular parameter of interest to measure or set specific parameters as constant during the fit. These configurations are all stored in the measurements key we saw above. We’ll explore more about models in the next chapter.
Let’s move on to more things we can do with the command line.
Workspace Inspection¶
Now that we have a working command for converting our XML to JSON, let’s go ahead and take advantage of the JSON output by piping it to pyhf inspect which will print out a nice summary of our workspace.
! pyhf inspect --help! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf inspectImmediately, we get a lot of useful information. We can see the number of channels, samples, parameters, and modifiers. Then we get a breakdown of the channels (and the number of bins for each channel), the samples, and the parameters. Finally, we see a list of measurements defined in the workspace, as well as the (*) denoting the default measurement if one is not specified.
Could the number of parameters and modifiers differ?
“Normalizing” a Workspace¶
There comes a time when you need to make comparisons to determine changes between two workspaces. This means depending on how the workspace is generated, one might need to “sort” it. pyhf sort is a utility that will normalize the workspace for you, such that certain operations like calculating a checksum (pyhf digest) guarantees unitarity.
For simple workspaces like the ones we’re using in this tutorial, they’re already sorted... however, this is not true in the real world. Notice how the bkg is now the first sample and signal is the second sample after sorting.
! pyhf sort --help! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf sortComputing a digest¶
Next up is a way to determine if two workspaces are equivalent, simply by comparing their computed digest. Note that this is based on the contents of the workspace and will not ensure floating-point differences are treated identically. That is, 2.19999999 and 2.2000001 will likely be treated as differently in the digest calculation as in python. We’ll show here why sorting is very important.
! pyhf digest --help! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf digest! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf sort | \
pyhf digestRemember that the ordering of the samples will have switched through the sorting.
The sha256 algorithm is used to compute the checksum for this workspace. This means that one can generally “normalize” all workspaces, then compute the digest and guarantee uniqueness. As with all command line functionality you’ve seen so far, there are equivalent ways to do it through python.
print(f"Unsorted: {pyhf.utils.digest(ws)}")
print(f"Sorted: {pyhf.utils.digest(pyhf.Workspace.sorted(ws))}")“Pruning” away items¶
Sometimes you want to manipulate workspaces by removing channels or samples or systematics (or measurements). This can be useful when trying to debug fits, or to build background-only workspaces, or to clean up a workspace.
! pyhf prune --helpprune channels¶
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf prune -c channel1 | \
pyhf inspectprune samples¶
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf prune -s signal | \
pyhf inspectprune modifiers¶
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf prune -m uncorrshape_signal | \
pyhf inspectprune modifier types¶
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf prune -t shapesys | \
pyhf inspectRenaming items¶
In addition to removing items, you might want to rename your channels, samples, modifiers, or measurement names. This can be useful for creating modifier correlations, or removing modifier correlations, or just cleaning up your workspace to get it ready for publication.
! pyhf rename --helprename channels¶
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf rename -c channel1 SR -c channel2 CR | \
pyhf inspectrename samples¶
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf rename -s bkg background | \
pyhf inspectrename modifiers¶
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf rename -m uncorrshape_signal corrshape -m uncorrshape_control corrshape | \
pyhf inspectrename measurements¶
! pyhf xml2json --basedir data/multichannel_histfactory data/multichannel_histfactory/config/example.xml --hide-progress | \
pyhf rename --measurement GaussExample FitConfig | \
pyhf inspect