{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using HEPData"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "import pyhf\n",
    "import pyhf.contrib.utils"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preserved on HEPData"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As of this tutorial, ATLAS has [published 18 full statistical models to HEPData](https://scikit-hep.org/pyhf/citations.html#published-statistical-models)\n",
    "\n",
    "<p align=\"center\">\n",
    "<a href=\"https://www.hepdata.net/record/ins1755298?version=3\"><img src=\"https://raw.githubusercontent.com/matthewfeickert/talk-SciPy-2020/e0c509cd0dfef98f5876071edd4c60aff9199a1b/figures/HEPData_likelihoods.png\"></a>\n",
    "</p>\n",
    "\n",
    "Let's explore the 1Lbb workspace a little bit shall we?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Getting the Data\n",
    "\n",
    "We'll use the `pyhf[contrib]` extra (which relies on `requests` and `tarfile`) to download the HEPData minted DOI and extract the files we need."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pyhf.contrib.utils.download(\n",
    "    \"https://doi.org/10.17182/hepdata.90607.v3/r3\", \"1Lbb-likelihoods\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will nicely download and extract everything we need."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls -lavh 1Lbb-likelihoods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instantiate our objects\n",
    "\n",
    "We have a background-only workspace `BkgOnly.json` and a signal patchset collection `patchset.json`. Let's create our python objects and play with them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spec = json.load(open(\"1Lbb-likelihoods/BkgOnly.json\"))\n",
    "patchset = pyhf.PatchSet(json.load(open(\"1Lbb-likelihoods/patchset.json\")))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So what did the analyzers give us for signal patches?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Patching in Signals\n",
    "\n",
    "Let's look at this [`pyhf.PatchSet`](https://pyhf.readthedocs.io/en/v0.7.5/_generated/pyhf.patchset.PatchSet.html#pyhf.patchset.PatchSet) object which provides a user-friendly way to interact with many signal patches at once.\n",
    "\n",
    "### PatchSet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "patchset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Oh wow, we've got 125 patches. What information does it have?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"description: {patchset.description}\")\n",
    "print(f\"    digests: {patchset.digests}\")\n",
    "print(f\"     labels: {patchset.labels}\")\n",
    "print(f\" references: {patchset.references}\")\n",
    "print(f\"    version: {patchset.version}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So we've got a useful description of the signal patches... there's a digest. Does that match the background-only workspace we have?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pyhf.utils.digest(spec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It does! In fact, this sort of verification check will be done automatically when applying patches using `pyhf.PatchSet` as we will see shortly. To manually verify, simply run `pyhf.PatchSet.verify` on the workspace. No error means everything is fine. It will loudly complain otherwise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "patchset.verify(spec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "No error, whew. Let's move on.\n",
    "\n",
    "The labels `m1` and `m2` tells us that we have the signal patches parametrized in 2-dimensional space, likely as $m_1 = \\tilde{\\chi}_1^\\pm$ and $m_2 = \\tilde{\\chi}_1^0$... but I guess we'll see?\n",
    "\n",
    "The references list the references for this dataset, which is pointing at the hepdata record for now.\n",
    "\n",
    "Next, the version is the version of the schema set we're using with `pyhf` (`1.0.0`).\n",
    "\n",
    "And last, but certainly not least... its patches:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "patchset.patches"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So we can see all the patches listed both by name such as `C1N2_Wh_hbb_900_250` as well as a pair of points `(900, 250)`. Why is this useful? The `PatchSet` object acts like a special dictionary look-up where it will grab the patch you need based on the unique key you provide it.\n",
    "\n",
    "For example, we can look up by name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "patchset[\"C1N2_Wh_hbb_900_250\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or by the pair of points"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "patchset[(900, 250)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Patches\n",
    "\n",
    "A `pyhf.PatchSet` is a collection of `pyhf.Patch` objects. What is a patch indeed? It contains enough information about how to apply the signal patch to the corresponding background-only workspace (matched by digest)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "patch = patchset[\"C1N2_Wh_hbb_900_250\"]\n",
    "print(f\"  name: {patch.name}\")\n",
    "print(f\"values: {patch.values}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Most importantly, it contains the patch information itself. Specifically, this inherits from the `jsonpatch.JsonPatch` object, which is a 3rd party module providing native support for json patching in python. That means we can simply apply the patch to our workspace directly!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\" samples pre-patch: {pyhf.Workspace(spec).samples}\")\n",
    "print(f\"samples post-patch: {pyhf.Workspace(patch.apply(spec)).samples}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or, more quickly, from the `PatchSet` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\" samples pre-patch: {pyhf.Workspace(spec).samples}\")\n",
    "print(f\"samples post-patch: {pyhf.Workspace(patchset.apply(spec, (900, 250))).samples}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Patching via Model Creation\n",
    "\n",
    "One last way to apply the patching is to, instead of patching workspaces, we patch the models as we build them from the background-only workspace. This maybe makes it easier to treat the background-only workspace as immutable, and patch in signal models when grabbing the model. Check it out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "workspace = pyhf.Workspace(spec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, load up our background-only spec into the workspace. Then let's create a model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = workspace.model(patches=[patchset[\"C1N2_Wh_hbb_900_250\"]])\n",
    "print(f\"samples (workspace): {workspace.samples}\")\n",
    "print(f\"samples (  model  ): {model.config.samples}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Doing Physics\n",
    "\n",
    "So we want to try and reproduce part of the contour. At least convince ourselves we're doing *physics* and not *fauxsics*. ... Anyway... Let's remind ourselves of the 1Lbb contour as we don't have the photographic memory of the ATLAS SUSY conveners\n",
    "\n",
    "<img alt=\"1Lbb exclusion contour\" src=\"https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2019-08/fig_06.png\" width=\"600\" />\n",
    "\n",
    "So let's work around the 700-900 GeV $\\tilde{\\chi}_1^\\pm, \\tilde{\\chi}_2^0$ region. We'll look at two points here:\n",
    "\n",
    "* `C1N2_Wh_hbb_650_0(650, 0)` which is below the contour and excluded\n",
    "* `C1N2_Wh_hbb_1000_0(1000, 0)` which is above the contour and not excluded\n",
    "\n",
    "Let's perform a \"standard\" hypothesis test (with $\\mu = 1$ null BSM hypothesis) on both of these and use the $\\text{CL}_\\text{s}$ values to convince ourselves that we just did reproducible physics!?!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Doing Physics, for real now"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_below = workspace.model(patches=[patchset[\"C1N2_Wh_hbb_650_0\"]])\n",
    "\n",
    "model_above = workspace.model(patches=[patchset[\"C1N2_Wh_hbb_1000_0\"]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've made our models. Let's test hypotheses!\n",
    "\n",
    "*Note: this will not be as instantaneous as our simple models...but it should still be pretty fast!*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_poi = 1.0\n",
    "result_below = pyhf.infer.hypotest(\n",
    "    test_poi,\n",
    "    workspace.data(model_below),\n",
    "    model_below,\n",
    "    test_stat=\"qtilde\",\n",
    "    return_expected_set=True,\n",
    ")\n",
    "\n",
    "print(f\"Observed CLs: {result_below[0]}\")\n",
    "print(f\"Expected CLs band: {[exp.tolist() for exp in result_below[1]]}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result_above = pyhf.infer.hypotest(\n",
    "    test_poi,\n",
    "    workspace.data(model_above),\n",
    "    model_above,\n",
    "    test_stat=\"qtilde\",\n",
    "    return_expected_set=True,\n",
    ")\n",
    "\n",
    "print(f\"Observed CLs: {result_above[0]}\")\n",
    "print(f\"Expected CLs band: {[exp.tolist() for exp in result_above[1]]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And as you can see, we're getting results that we generally expect. Excluded models are those for which $\\text{CL}_\\text{s} < 0.05$. Additionally, you can see that the expected bands $-2\\sigma$ for the $(1000, 0)$ point is just slightly below the observed result for the $(650, 0)$ point which is what we observe in the figure above."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}