File: CombineTools/bin/Example1.cpp
In this example we use CombineHarvester to parse an existing datacard and then extract information from it. Open the file above and take a look at the source code. To run the example, first make sure the code has been compiled with scram
:
cd $CMSSW_BASE/src scram b -j4 Example1
In the first part we locate and open a single text datacard file:
When parsing a datacard, CombineHarvester breaks down the information it contains into sets of objects, each represented by a C++ class. A ch::Observation object stores the information about the observed data in a single category, and likewise ch::Process stores the information for one expected signal or background process in a category. A ch::Systematic object records the uncertainty value assigned to a particular process from a particular source.
shape
). Once the mapped ROOT file has been located and opened, the relevant histograms are copied into their corresponding CombineHarvester objects.Each object class stores a standard set of metadata, designed to aid in the filtering and selection of particular objects within a CombineHarvester, and which in the example above is specified explicitly. The possible metadata is listed in the following table.
name | type | example value |
---|---|---|
bin | string | automatic |
process | string | automatic |
analysis | string | "htt" |
era | string | "8TeV" |
channel | string | "mt" |
bin_id | int | 6 |
mass | string | "125" |
Of these only bin
, process
and mass
are tracked and used by combine, the others are optional and can be left empty if unneeded. The bin
property is used to uniquely label an event category. Along with the process names, this is written directly into the datacard and is extracted automatically. The mass
property is an exception: although we typically create a datacard for a particular signal mass hypothesis this information is not recorded in the datacard, but rather is passed to combine as a command line option, e.g. combine -M Asymptotic -m 125 my_datacard.txt
.
mass
is not specified, ParseDatacard is likely to fail as this property is often needed to map signal processes to histograms in the input ROOT file. You can tell if this property is needed by looking for the term $MASS
in the shapes
rules at the top of the text datacard.An important concept is that the objects created from a datacard are not explicitly linked - each is completely independent. When it's necessary to determine which objects are related, e.g. if we ask for the total uncertainty on a particular process, CombineHarvester will determine this on-the-fly by matching up ch::Process and ch::Systematic objects that have identical metadata.
The last line of the code will print the information stored about the Observation, Process and Systematic entries that we've created. Like many CombineHarvester methods, these Print functions return a reference to the calling object, meaning they can be chained together to keep the code concise. The output will look like:
-------------------------------------------------------------------------------------------------------- mass analysis era channel bin id process rate shape -------------------------------------------------------------------------------------------------------- 125 htt 8TeV mt muTau_vbf_loose 6 data_obs 76 1 -------------------------------------------------------------------------------------------------------- mass analysis era channel bin id process sig rate shape -------------------------------------------------------------------------------------------------------- 125 htt 8TeV mt muTau_vbf_loose 6 ggH 1 1.2171 1 125 htt 8TeV mt muTau_vbf_loose 6 qqH 1 3.5739 1 125 htt 8TeV mt muTau_vbf_loose 6 VH 1 0.029033 1 125 htt 8TeV mt muTau_vbf_loose 6 ZTT 0 50.212 1 125 htt 8TeV mt muTau_vbf_loose 6 QCD 0 10.788 1 125 htt 8TeV mt muTau_vbf_loose 6 W 0 22.544 1 125 htt 8TeV mt muTau_vbf_loose 6 ZL 0 0.23798 1 125 htt 8TeV mt muTau_vbf_loose 6 ZJ 0 1.9001 1 125 htt 8TeV mt muTau_vbf_loose 6 TT 0 2.0405 1 125 htt 8TeV mt muTau_vbf_loose 6 VV 0 0.7037 1 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- mass analysis era channel bin id process sig nuisance type value sh_d sh_u ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- 125 htt 8TeV mt muTau_vbf_loose 6 ggH 1 lumi_8TeV lnN 1.026 0 0 125 htt 8TeV mt muTau_vbf_loose 6 qqH 1 lumi_8TeV lnN 1.026 0 0 125 htt 8TeV mt muTau_vbf_loose 6 VH 1 lumi_8TeV lnN 1.026 0 0 125 htt 8TeV mt muTau_vbf_loose 6 qqH 1 pdf_qqbar lnN 1.036 0 0 125 htt 8TeV mt muTau_vbf_loose 6 VH 1 pdf_qqbar lnN 1.04 0 0 125 htt 8TeV mt muTau_vbf_loose 6 ggH 1 pdf_gg lnN 1.097 0 0 125 htt 8TeV mt muTau_vbf_loose 6 ggH 1 UEPS lnN 0.893 0 0 125 htt 8TeV mt muTau_vbf_loose 6 qqH 1 UEPS lnN 0.988 0 0 125 htt 8TeV mt muTau_vbf_loose 6 VH 1 UEPS lnN 0.988 0 0 125 htt 8TeV mt muTau_vbf_loose 6 ggH 1 QCDscale_ggH2in lnN 0.772 0 0 125 htt 8TeV mt muTau_vbf_loose 6 qqH 1 QCDscale_qqH lnN 1.018 0 0 125 htt 8TeV mt muTau_vbf_loose 6 VH 1 QCDscale_VH lnN 1.04 0 0 125 htt 8TeV mt muTau_vbf_loose 6 ggH 1 CMS_scale_t_mutau_8TeV shape 0.9513/1.093 1 1 125 htt 8TeV mt muTau_vbf_loose 6 qqH 1 CMS_scale_t_mutau_8TeV shape 0.9585/1.038 1 1 125 htt 8TeV mt muTau_vbf_loose 6 VH 1 CMS_scale_t_mutau_8TeV shape 0.9999/1 1 1 125 htt 8TeV mt muTau_vbf_loose 6 ZTT 0 CMS_scale_t_mutau_8TeV shape 0.9639/1.055 1 1 ...
Next we create another CombineHarvester instance and parse several datacards, this time using a method in which the object metadata is inferred from the datacard filenames:
In this ParseDatacard method only two arguments are required: the path to the text datacard and a string containing place-holders for the metadata. Note that there's no obligation to include all five place-holders in this string. Additionally, the place-holders are not restricted to appearing in the filename but may also be included in the preceding directory path, e.g. $MASS/$ANALYSIS_$CHANNEL_$BINID_$ERA.txt
is also valid.
Now let's pretend we are no longer interested in the first four categories we parsed. To remove objects we use a filter method. There are four generic filter methods: FilterAll, FilterObs, FilterProcs and FilterSysts that each accept a function, or function-type object that must have a single ch::Object, ch::Observation, ch::Process or ch::Systematic pointer argument and return a bool. A true
return value indicates that the object should be dropped. The FilterAll method will act on all three object collections whereas the others operate only on their respective collections. It's often convenient to write a small lambda function in-place:
Alternatively,there are a number of fixed-property filters, in which you need only supply a vector of the object properties you want to keep:
An optional boolean can be supplied as a second argument. When set to false this reverses the logic - objects with a property in the list will be dropped. In the second line we use this to remove all information about the QCD process. The full list of filter methods is found here
In the final part of this example we take a look at the rate evaluation methods. These calculate the total event yields for either the observed data or the expected processes:
Note that these functions are greedy - they will sum the contribution from every available Observation or Process entry. This means in the first line we get the total number of observed events in the three remaining categories. To get the yield for a single category we can prefix the function with a filter method. But here we must be careful, because we don't want to actually remove the information on the other categories permanently, which is what would happen if we just do:
cmb2.bin({"muTau_vbf_loose"}).GetObservedRate(); // cmb2 only contains objects for the "muTau_vbf_loose" category now!
To get around this we first call the cp method on our CombineHarvester instance. This makes a shallow copy of the instance - in this it is only pointers to the contained objects, not the objects themselves, which are copied into a new instance. Such a copy is computationally fast to make, and we are free to filter objects from it without affecting the original instance at all.
The last part of the example code uses the CombineHarvester set-generating methods to conveniently loop through all defined (bin, process) combinations and print out the expected yield. The full list of available set-generating methods can be found here.