CombineHarvester
Examples Part II

File: CombineTools/bin/Example2.cpp

In this example we will set up a simplified version of the Higgs to tau tau datacards, while exploring the main features of datacard creation with the CombineHarvester tool. To run the example, first make sure the code has been compiled:

cd $CMSSW_BASE/src
scram b -j4
Example2

Defining categories and processes

We start by defining two analysis categories (or 'bins'). It's a good idea for each bin to have a unique name: this is required by combine, and while not required by CombineHarvester explicitly, a number of functions rely on this being true. CombineHarvester also allows for each object to be assigned an integer value, called a "bin_id", that does not need to be unique. This can be useful to label a "type-of-category" that might appear more than once. For example, VBF event categories for both the 7TeV and 8TeV datasets might have a common bin_id, but different names: "vbf_7TeV" and "vbf_8TeV".

// First define the location of the "auxiliaries" directory where we can
// source the input files containing the datacard shapes
string aux_shapes = string(getenv("CMSSW_BASE")) + "/src/auxiliaries/shapes/";
// Create an empty CombineHarvester instance that will hold all of the
// datacard configuration and histograms etc.
// Uncomment this next line to see a *lot* of debug information
// cb.SetVerbosity(3);
// Here we will just define two categories for an 8TeV analysis. Each entry in
// the vector below specifies a bin name and corresponding bin_id.
ch::Categories cats = {
{1, "muTau_1jet_medium"},
{2, "muTau_vbf_loose"}
};
// ch::Categories is just a typedef of vector<pair<int, string>>

Now we define the signal mass points we will build datacards for, but note these are specified as strings, not floats. The function ch::MassesFromRange is used to quickly generate mass values from 120 to 135 GeV in 5 GeV steps.

vector<string> masses = ch::MassesFromRange("120-135:5");
// Or equivalently, specify the mass points explicitly:
// vector<string> masses = {"120", "125", "130", "135"};

The next step is to add some new objects to the CombineHarvester instance. First we will specifiy the observations (i.e. the actual data). The AddObservations method takes a series of vectors as arguments. Each vector specifies some property, such as the analysis name, the dataset era or the bin information. Every possible combination of elements from these vectors will be used to add a new Observation entry.

The arguments are: AddObservations(masses, analyses, eras, channels, categories)

Below we specify one mass entry ("*"), which implies we only need one entry that will cover all signal mass hypotheses. Then we specify the higgs-tau-tau analysis ("htt"), the 8TeV dataset ("8TeV"), the mu+tau analysis channel ("mt") and the categories we defined above. If the analysis, era and channel properties aren't relevant for your analysis, you can always leave them as a single emtpy string.

cb.AddObservations({"*"}, {"htt"}, {"8TeV"}, {"mt"}, cats);

Next we add the signal and background processes. The arguments are similar to the AddObservations method. An extra argument is added after the channels for the list of processes, and a final boolean option specifies whether these are signal or background processes. Note that each process name here should correspond to the histogram name in your input file. In the signal case we pass the list of Higgs mass hypotheses generated above instead of the generic "*".

vector<string> bkg_procs = {"ZTT", "W", "QCD", "TT"};
cb.AddProcesses({"*"}, {"htt"}, {"8TeV"}, {"mt"}, bkg_procs, cats, false);
vector<string> sig_procs = {"ggH", "qqH"};
cb.AddProcesses(masses, {"htt"}, {"8TeV"}, {"mt"}, sig_procs, cats, true);

Creating systematics

The next step is to add details of the systematic uncertainties. The details of an uncertainty on a single process in a single bin is called a ch::Systematic. With CombineHarvester we create the ch::Systematic entries for each uncertainty source in turn. In doing so we must specify: the name of the nuisance parameter we want to use, the type (i.e. normalisation or shape), which processes it should be applied to and the magnitude of the uncertainty on each process. All this information can be expressed in a single line of code (though for clarity we will usually split it over multiple lines), for example the luminosity uncertainty:

cb.cp().signals()
.AddSyst(cb, "lumi_$ERA", "lnN", SystMap<era>::init
({"7TeV"}, 1.022)
({"8TeV"}, 1.026));

We can break this line down into several parts. cb.cp() returns a shallow copy of the CombineHarvester instance - i.e. the Observation and Process entries are shared with the original object (see the documentation of Example 1). However, removing entries from this shallow copy leaves the entries in the original instance intact. The next method, signals(), acts on this copy. This is one of several filter methods. It removes any non-signal process from the internal entries. We do this because we only want to create Systematic entries for the signal processes. Like all filter methods this returns a reference to itself. Then we can apply the actual AddSyst method. The first argument is a reference to the CombineHarvester instance where the new Systematic entries should be created. In this case we just give it our original instance (remember we are calling the AddSyst method on a copy of this instance). The next argument is the Systematic name. Before the Systematic entry for each Process is created a number of string substitutions will be made, based on the properties of the process in question. These are:

$BIN       --> proc.bin()
$PROCESS   --> proc.process()  (the process name)
$MASS      --> proc.mass()
$ERA       --> proc.era()
$CHANNEL   --> proc.channel()
$ANALYSIS  --> proc.analysis()

So in this example we will expect names like "lumi_8TeV". This substitution provides a quick way of renaming systematics to be correlated/uncorrelated between different channels/analyses/bins etc. Next we specifiy the nuisance type, which must be either "lnN" or "shape". The final argument is special map (SystMap) that contains the set of values that should be added. The SystMap is a templated class, which can take an arbitrary number of template parameters. Each parameter specifies a Process property that will be used as part of the key to map to the values. In this case we will just use the process era as a key. We initialse a new map with init, then provide a series of entries. Each entry should consist of a series of vectors, one for each key value, and end in the lnN value that should be assigned. Processes matching any combination of key properties in this map will be assigned the given value. In this map, we assign any Process with era "7TeV" a value of 1.022, and any "8TeV" Process a value of 1.026. More examples are given below:

cb.cp().process({"ggH"})
.AddSyst(cb, "pdf_gg", "lnN", SystMap<>::init(1.097));
cb.cp().process(ch::JoinStr({sig_procs, {"ZTT", "TT"}}))
.AddSyst(cb, "CMS_eff_m", "lnN", SystMap<>::init(1.02));
cb.cp()
.AddSyst(cb,
"CMS_scale_j_$ERA", "lnN", SystMap<era, bin_id, process>::init
({"8TeV"}, {1}, {"ggH"}, 1.04)
({"8TeV"}, {1}, {"qqH"}, 0.99)
({"8TeV"}, {2}, {"ggH"}, 1.10)
({"8TeV"}, {2}, {"qqH"}, 1.04)
({"8TeV"}, {2}, {"TT"}, 1.05));
cb.cp().process(ch::JoinStr({sig_procs, {"ZTT"}}))
.AddSyst(cb, "CMS_scale_t_mutau_$ERA", "shape", SystMap<>::init(1.00));

Creation of asymmetric "lnN" uncertainties is supported through the SystMapAsymm class, whose interface is very similar to SystMap. Instead of a single uncertainty value, simply provide the "down" and "up" relative uncertainties as two separate arguments.

Extracting shape inputs

Next we populate these Observation, Process and Systematic entries with the actual histograms (and also yields). The last two arguments are the patterns which will be used to determine the paths of the nominal and systematic shape templates in the given file. Here we must do this separately for signal and background shapes which use a different naming convention. NOTE: In this method only the following substitutions are supported:

$BIN        --> proc.bin()
$PROCESS    --> proc.process()
$MASS       --> proc.mass()
$SYSTEMATIC --> syst.name()

Also note that data histogram must be named data_obs to be extracted by this command.

aux_shapes + "Imperial/htt_mt.inputs-sm-8TeV-hcg.root",
"$BIN/$PROCESS",
"$BIN/$PROCESS_$SYSTEMATIC");
aux_shapes + "Imperial/htt_mt.inputs-sm-8TeV-hcg.root",
"$BIN/$PROCESS$MASS",
"$BIN/$PROCESS$MASS_$SYSTEMATIC");

Adding bin-by-bin uncertainties

The next step is optional. This will generate additional shape uncertainties to account for limited template statistics, so-called "bin-by-bin" uncertainties.

auto bbb = ch::BinByBinFactory()
.SetFixNorm(true);
bbb.AddBinByBin(cb.cp().backgrounds(), cb);
// This function modifies every entry to have a standardised bin name of
// the form: {analysis}_{channel}_{bin_id}_{era}
// which is commonly used in the htt analyses

We first create a ch::BinByBinFactory instance, and specify the bin error threshold over which an uncertainty should be created, expressed as a percentage of the bin content. We also set the flag "FixedNorm", which controls the normalisation of the Up and Down shapes that are created. If set to true, the normalisation is fixed to nominal rate. If false, the normalisation is allowed to vary. We then call the AddBinByBin method specifying that only the background processes should be considered.

Writing datacards

Now we will iterate through the set of bins and mass points and write a text datacard file for each:

// First we generate a set of bin names:
set<string> bins = cb.bin_set();
// This method will produce a set of unique bin names by considering all
// Observation, Process and Systematic entries in the CombineHarvester
// instance.
// We create the output root file that will contain all the shapes.
TFile output("htt_mt.input.root", "RECREATE");
// Finally we iterate through each bin,mass combination and write a
// datacard.
for (auto b : bins) {
for (auto m : masses) {
cout << ">> Writing datacard for bin: " << b << " and mass: " << m
<< "\n";
// We need to filter on both the mass and the mass hypothesis,
// where we must remember to include the "*" mass entry to get
// all the data and backgrounds.
cb.cp().bin({b}).mass({m, "*"}).WriteDatacard(
b + "_" + m + ".txt", output);
}
}

While we are required to write a separate datacard for each mass point, there is no obligation to generate one for each bin. For example,

     cb.cp().mass({"125", "*"}).WriteDatacard("combined_125.txt", output);

will produce a datacard containing all categories.

ch::JoinStr
std::vector< std::string > JoinStr(std::vector< std::vector< std::string >> const &in)
Definition: Utilities.cc:153
ch::BinByBinFactory::SetAddThreshold
BinByBinFactory & SetAddThreshold(double val)
Set the fractional bin error threshold for bin-by-bin creation and for participation in the merging a...
Definition: BinByBin.h:99
ch::BinByBinFactory::AddBinByBin
void AddBinByBin(CombineHarvester &src, CombineHarvester &dest)
Create bin-by-bin shape uncertainties for every process in src, and add these to dest
Definition: BinByBin.cc:105
ch::CombineHarvester::AddSyst
void AddSyst(CombineHarvester &target, std::string const &name, std::string const &type, Map const &valmap)
Definition: CombineHarvester.h:678
ch::MassesFromRange
std::vector< std::string > MassesFromRange(std::string const &input, std::string const &fmt="%.0f")
Generate a vector of mass values using ranges and intervals specified in a string.
Definition: Utilities.cc:249
ch::Categories
std::vector< std::pair< int, std::string > > Categories
Definition: CombineHarvester.h:28
ch::CombineHarvester::backgrounds
CombineHarvester & backgrounds()
Definition: CombineHarvester_Filters.cc:156
ch::CombineHarvester::AddProcesses
void AddProcesses(std::vector< std::string > mass, std::vector< std::string > analysis, std::vector< std::string > era, std::vector< std::string > channel, std::vector< std::string > procs, ch::Categories bin, bool signal)
Definition: CombineHarvester_Creation.cc:45
ch::CombineHarvester::bin_set
std::set< std::string > bin_set()
Definition: CombineHarvester_Filters.cc:190
ch::CombineHarvester::ExtractShapes
void ExtractShapes(std::string const &file, std::string const &rule, std::string const &syst_rule)
Definition: CombineHarvester_Creation.cc:147
ch::SetStandardBinNames
void SetStandardBinNames(CombineHarvester &cb, std::string const &pattern="$ANALYSIS_$CHANNEL_$BINID_$ERA")
Definition: Utilities.cc:78
ch::CombineHarvester::signals
CombineHarvester & signals()
Definition: CombineHarvester_Filters.cc:146
ch::CombineHarvester::cp
CombineHarvester cp()
Creates and returns a shallow copy of the CombineHarvester instance.
Definition: CombineHarvester.cc:220
ch::CombineHarvester::bin
CombineHarvester & bin(std::vector< std::string > const &vec, bool cond=true)
Definition: CombineHarvester_Filters.cc:13
ch::CombineHarvester
Definition: CombineHarvester.h:30
ch::CombineHarvester::AddObservations
void AddObservations(std::vector< std::string > mass, std::vector< std::string > analysis, std::vector< std::string > era, std::vector< std::string > channel, ch::Categories bin)
Definition: CombineHarvester_Creation.cc:20
ch::BinByBinFactory
Merges bin uncertainties and creates bin-by-bin statistical uncertainties.
Definition: BinByBin.h:21
ch::BinByBinFactory::SetFixNorm
BinByBinFactory & SetFixNorm(bool fix)
Whether or not the bin-by-bin systematics are allowed to vary the process normalisation.
Definition: BinByBin.h:124
ch::CombineHarvester::process
CombineHarvester & process(std::vector< std::string > const &vec, bool cond=true)
Definition: CombineHarvester_Filters.cc:35