Observation Models and Likelihoods
The Observation Model
The observation model,
where the subscript
Combine is designed for counting experiments, where the number of events with particular features are counted. The events can either be binned, as in histograms, or unbinned, where continuous values are stored for each event. The event counts are assumed to be of independent events, such as individual proton-proton collisions, which are not correlated with each other.
The event-count portion of the model consists of a sum over different processes.
The expected observations,
The model can also be composed of multiple channels, in which case the expected observation is the set of all expected observations from the various channels
The model can also include data and parameters related to non-count values, such as the observed luminosity or detector calibration constant. These non-count data are usually considered as auxiliary information which are used to constrain our expectations about the observed event counts.
The full model therefore defines the probability of any given observations over all the channels, given all the processes and model parameters.
Combining full models is possible by combining their channels, assuming that the channels are mutually independent.
A Simple Example
Consider performing an analysis searching for a Higgs boson by looking for events where the Higgs decays into two photons.
The event count data may be binned histograms of the number of events with two photons with different bins of invariant mass of the photons. The expected counts would include signal contributions from processes where a Higgs boson is produced, as well as background contributions from processes where two photons are produced through other mechanisms, like radiation off a quark. The expected counts may also depend on parameters such as the energy resolution of the measured photons and the total luminosity of collisions being considered in the dataset, these can be parameterized in the model as auxiliary information.
The analysis itself might be split into multiple channels, targetting different Higgs production modes with different event selection criteria. Furthermore, the analysis may eventually be combined with other analyses, such as a measurement targetting Higgs production where the Higgs boson decays into four leptons, rather than two photons.
Combine provides the functionality for building the statistical models and combining all the channels or analyses together into one common analysis.
Sets of Observation Models
We are typically not interested in a single model, but in a set of models, parameterized by a set of real numbers representing possible versions of the model.
Model parameters include the parameters of interest (
Combine provides tools and interfaces for defining the model as pre-defined or user-defined functions of the input parameters. In practice, however, there are a number of most commonly used functional forms which define how the expected events depend on the model parameters. These are discussed in detail in the context of the full likelihood below.
The Likelihood
For any given model,
Note, however that the likelihood is a function of the model parameters, not the data, which is why we distinguish it from the probability itself.
The likelihood in combine takes the general form:
Where
Both
This form is entirely general. However, as with the model itself, there are typical forms that the likelihood takes which will cover most use cases, and for which combine is primarily designed.
Primary Likelihoods for binned data
For a binned likelihood, the probability of observing a certain number of counts, given a model takes on a simple form. For each bin:
i.e. it is a poisson distribution with the mean given by the expected number of events in that bin. The full primary likelihood for binned data is simply the product of each of the bins' likelihoods:
This is the underlying likelihood model used for every binned analysis.
The freedom in the analysis comes in how
Primary Likelihoods for unbinned data
For unbinned likelihood models, a likelihood can be given to each data point. It is proportional to the probability density function at that point,
Where
Auxiliary Likelihoods
The auxiliary likelihood terms encode the probability of model nuisance parameters taking on a certain value, without regards to the primary data. In frequentist frameworks, this usually represents the result of a previous measurement (such as of the jet energy scale). We will write in a mostly frequentist framework, though combine can be used for either frequentist or bayesian analyses[^1].
[^1]: see: the first paragraphs of the PDGs statistics review for more information on these two frameworks
In this framework, each auxiliary term represents the likelihood of some parameter,
In principle the form of the likelihood can be any function where the corresponding
Note that on its own, the form of the auxiliary term is not meaningful; what is meaningful is the relationship between the auxiliary term and how the model expectation is altered by the parameter.
Any co-ordinate transformation of the parameter values can be absorbed into the definition of the parameter.
A reparameterization would change the mathematical form of the auxiliary term, but would also simultaneously change how the model depends on the parameter in such a way that the total likelihood is unchanged.
e.g. if you define
Likelihoods implemented in Combine
Combine builds on the generic forms of the likelihood for counting experiments given above to provide specific functional forms which are commonly most useful in high energy physics, such as separating contributions between different processes.
Binned Likelihoods using Templates
Binned likelihood models can be defined by the user by providing simple inputs such as a set of histograms and systematic uncertainties. These likelihood models are referred to as template-based because they rely heavily on histograms as templates for building the full likelihood function.
Here, we describe the details of the mathematical form of these likelihoods. As already mentioned, the likelihood can be written as a product of two parts:
Where
Model of expected event counts per bin
The generic model of the expected event count in a given bin,
where here:
indexes the processes contributing to the channel; and are different types of nuisance parameters which modify the processes with different functional forms; is a gamma nuisances, are log-normal nuisances, are "shape" nuisances, are user defined rate parameters, and are nuisance parameters related to the statistical uncertainties in the simulation used to build the model.
defines the effect of the parameters of interest on the signal process; defines the overall normalization effect of the nuisance parameters; defines the shape effects (i.e. bin-dependent effects) of the nuisance parameters; and defines the impact of statistical uncertainties from the samples used to derive the histogram templates used to build the model.
Parameter of Interest Model
The function
However, combine supports many more models beyond this. As well as built-in support for models with multiple parameters of interest, combine comes with many pre-defined models which go beyond simple process normalization, which are targetted at various types of searches and measurements.
Normalization Effects
The overall normalization
With
Normalization Parameterization Details
The full functional form of the normalization term is given by:
where:
, is the normalization effect of a gamma uncertainty. is taken as the observed number of events in some external control region and has a constraint pdf , are log-normal uncertainties specified by a fixed value ; are asymmetric log-normal uncertainties, in which the value of depends on the nuisance parameter and two fixed values and . The functions, , define a smooth interpolation for the asymmetric uncertainty; and are user-defined functions of the user defined nuisance parameters which may have uniform or gaussian constraint terms.
The function for the asymmetric normalization modifier,
where
and the
where
Shape Morphing Effects
The number of events in a given bin
Shape parameterization Details
In the following, the channel and process labels
The fixed nominal number of events is denoted
For a given process, the shape may be interpolated either directly in terms of the fractional bin yields,
where
The smooth interpolating function
where
Statistical Uncertainties in the Simulation used to build the Model
Since the histograms used in a binned shape analysis are typically created from simulated samples, the yields in each bin are also subject to statistical uncertainties on the bin yields. These are taken into account by either assigning one nuisance parameter per bin, or as many parameters as contributing processes per bin.
Model Statistical Uncertainty Details
If the uncertainty in each bin is modelled as a single nuisance parameter it takes the form:
where
Alternatively, one parameter is assigned per process, which may be modelled with either a Poisson or Gaussian constraint pdf:
where the indices
Customizing the form of the expected event counts
Although the above likelihood defines some specific functional forms, users are also able to implement custom functional forms for
However, some constraints do exist, such as the requirement that bin contents be positive, and that the function
Auxiliary Likelihood terms
The auxiliary constraint terms implemented in combine are Gaussian, Poisson or Uniform:
Which form they have depends on the type of nuisance paramater:
- The shape (
) and log-normal ( ), nuisance parameters always use gaussian constraint terms; - The gamma (
) nuisance parameters always use Poisson constraints; - The rate parameters (
) may have either Gaussian or Uniform constraints; and - The model statistical uncertiainties (
) may use Gaussian or Poisson Constraints.
While combine does not provide functionality for user-defined auxiliary pdfs, the effect of nuisance paramters is highly customizable through the form of the dependence of
Overview of the template-based likelihood model in Combine
An overview of the binned likelihood model built by combine is given below.
Note that
Parametric Likelihoods in Combine
As with the template likelihood, the parameteric likelihood implemented in combine implements likelihoods for multiple process and multiple channels. Unlike the template likelihoods, the parametric likelihoods are defined using custom probability density functions, which are functions of continuous observables, rather than discrete, binned counts. Because the pdfs are functions of a continuous variable, the likelihood can be evaluated over unbinned data. They can still, also, be used for analysis on binned data.
The unbinned model implemented in combine is given by:
where
is the total number of expected events in channel ; are user defined probability density functions, which may take on the form of any valid probability density; and is the fraction of the total events in channel from process , .
for parametric likelihoods on binned data, the data likelihood is first converted into the binned data likelihood format before evaluation. i.e.
where
Model of expected event counts
The total number of expected events is modelled as:
where,
Details of Process Normalization
As in the template-based case, the different types of nuisance parameters affecting the process normalizations are:
is a gamma nuisance, with linear normalization effects and a poisson constraint term. are log-normal nuisances, with log-normal normalization effects and gaussian constraint terms. are user defined rate parameters, with user-defined normalization effects and gaussian or uniform constraint terms. defines the overall normalization effect of the nuisance parameters;
and
The function
Parameter of Interest Model
As in the template-based case, the parameter of interest model,
Shape Morphing Effects
The user may define any number of nuisance parameters which morph the shape of the pdf according to functional forms defined by the user.
These nuisance parameters are included as
Combining template-based and parametric Likelihoods
While we presented the likelihoods for the template and parameteric models separately, they can also be combined into a single likelihood, by treating them each as separate channels. When combining the models, the data likelihoods of the binned and unbinned channels are multiplied.
References and External Literature
- See the Particle Data Group's Review of Statistics for various fundamental concepts used here.
- The Particle Data Group's Review of Probability also has definitions of commonly used distributions, some of which are used here.