In this paper we develop a general framework for quantifying how binary risk factors jointly influence a binary outcome. Our key result is an additive expansion of odds ratios as a sum of marginal effects and interaction terms of varying order. These odds ratio expansions are used for estimating the excess odds ratio, attributable proportion and synergy index for a case-control dataset by means of maximum likelihood from a logistic regression model. The confidence intervals associated with these estimates of joint effects and interaction of risk factors rely on the delta method. Our methodology is illustrated with a large Nordic meta dataset for multiple sclerosis. It combines four studies, with a total of 6265 cases and 8401 controls. It has three risk factors (smoking and two genetic factors) and a number of other confounding variables.

Many complex diseases are influenced by a number of risk factors that interact in a complicated way. This is often quantified by means of a regression model with affection status of a given disease as binary response, whereas the risk factors and possibly some other variables are chosen as covariates. Logistic regression models have often been used to quantify main effects and strength of interaction among the risk factors with regards to disease. There are several reasons for this. The logistic transformation is first of all the canonical link of a generalized linear model with a binomially distributed response, and the parameters of this model have a straightforward multiplicative odds interpretation [

In previous work [

In this paper we concentrate on case-control data and additive odds measures of main effects, joint effects and interaction. Our main result is to express the odds ratio of the risk factors as a sum of terms, which include their main effects and different orders of interaction, when the effect of other confounding covariates is controlled for. In this way we extend and unify some previously used measures of interaction [

The paper is organized as follows: In Section

Let

The logistic model in (

It is assumed that

Our purpose is to estimate and produce confidence intervals for parameters

After possible reordering of factors we may assume without loss of generality that those in

Suppose the exposure levels

When

It is possible to give another interpretation of (

Having defined the odds ratio increment in (

The expansions in (

It is of interest to know how much of the odds ratio (

A prediction

It is possible to rewrite the predicted odds ratio (

Assume from now on that

The unadjusted excess odds ratio

The unadjusted excess odds ratio can be interpreted as an unstandardized residual of a regression model, where only marginal effects and interaction terms up to order

We will define three measures (

The excess odds ratio

The quantity

The synergy index

All three quantities in equations (

Some special cases of formulas (

When

When

Finally, when

Since EOR, AP and SI are all functions of UnadjEOR, it suffices to specify the latter. In the subsections to follow we will do so for models with 1, 2, or 3 risk factors in

When

When

When

Assume that a case-control dataset

Let

Multiple sclerosis (MS) is a complex and inflammatory disease causing damage to the central nervous system. Its prevalence is over 0.1% in many countries, affecting large regions of the world [

Number of cases and controls

Study | Cases | Controls |

Swedish EIMS study | 1308 | 1858 |

Swedish GEMS study | 3272 | 2382 |

Danish study | 1474 | 3469 |

Norwegian study | 211 | 692 |

Combined Nordic study | 6265 | 8401 |

The four Nordic studies are from Hedström et al. [

Apart from the two genetic factors and smoking, three other covariates (gender, age, study) were also part of the model. This gives a total of 8 covariates, encoded as

We find, for instance, that the point estimate of the marginal odds ratio (or relative risk) of having MS in the combined dataset is 3.6 for individuals with the DRB15 risk allele, compared to those that lack this allele. The corresponding marginal odds ratios for absence of the protecting A2 allele and for smoking are 1.75 and 2.0 respectively. Since the joint odds ratios for all pairs of risk factors are much larger than the corresponding marginal odds ratios, there are strong indications of two-way interactions between all pairs of risk factors. There is possibly some three-way interaction between DR15, A2- and smoking as well, since the joint OR for all three factors is higher than the pairwise odds ratios. On the other hand, the OR for the two genetic factors is only higher among smokers than among non-smokers for one study (EIMS).

Point estimates and 95% confidence intervals of the odds ratio (

Study | OR for one factor | ||

DR15 | A2- | sm | |

EIMS | 3.55 (3.05,4.13) | 1.74 (1.50,2.02) | 1.52 (1.30,1.78) |

GEMS | 3.70 (3.30,4.15) | 1.79 (1.60,2.00) | 1.62 (1.44,1.82) |

Danish | 3.42 (2.99,3.92) | 1.73 (1.51,1.98) | 3.09 (2.70,3.55) |

Norwegian | 5.02 (3.50,7.21) | 1.77 (1.24,2.53) | 2.13 (1.50,3.04) |

Combined | 3.60 (3.34,3.87) | 1.75 (1.63,1.88) | 2.00 (1.86,2.15) |

Study | OR for two factors | ||

DR15, sm | A2-, sm | DR15, A2- | |

EIMS | 5.41 (4.29,6.85) | 2.71 (2.17,3.40) | 6.18 (4.94,7.75) |

GEMS | 5.83 (4.93,6.91) | 2.89 (2.45,3.42) | 6.44 (5.46,7.60) |

Danish | 10.49 (8.57,12.84) | 5.35 (4.38,6.52) | 5.95 (4.89,7.23) |

Norwegian | 10.86 (6.53,18.07) | 3.78 (2.27,6.27) | 8.84 (5.21,14.99) |

Combined | 7.11 (6.37,7.93) | 3.51 (3.16,3.91) | 6.28 (5.64,6.98) |

Study | OR for three factors/confounders | ||

DR15, A2-|nsm | DR15, A2-|sm | DR15, A2-, sm | |

EIMS | 5.62 (4.27,7.39) | 7.72 (5.17,11.52) | 11.23 (7.81,16.14) |

GEMS | 7.07 (5.70,8.77) | 5.61 (4.34,7.27) | 9.96 (7.75,12.79) |

Danish | 6.57 (5.01,8.61) | 5.27 (3.95,7.01) | 17.95 (13.34,24.17) |

Norwegian | 9.02 (4.29,18.98) | 8.66 (4.14,18.11) | 18.27 (8.62,38.72) |

Combined | 6.37 (5.55,7.31) | 6.16 (5.20,7.29) | 12.63 (10.73,14.85) |

The sets of risk factors is

The estimates of Table

We have also estimated the attributable proportion separately for males and females (data not shown). The results are in quite good agreement with the upper part of Table

Point estimates and 95% confidence intervals for AP, EOR and SI

Study | AP for DR15, A2-, smoking | ||

Joint effects | 2nd & 3rd order interaction | 3rd order interaction | |

EIMS | 0.91 (0.87,0.94) | 0.59 (0.43,0.72) | 0.39 (0.12,0.61) |

GEMS | 0.90 (0.87,0.92) | 0.35 (0.19,0.50) | −0.02 (−0.29,0.25) |

Danish | 0.94 (0.93,0.96) | 0.60 (0.49,0.70) | 0.09 (−0.18,0.34) |

Norwegian | 0.95 (0.89,0.97) | 0.66 (0.36,0.84) | 0.22 (−0.33,0.66) |

Combined | 0.92 (0.91,0.93) | 0.53 (0.46,0.60) | 0.15 (0.00,0.29) |

Study | EOR for DR15, A2-, smoking | ||

Joint effects | 2nd & 3rd order interaction | 3rd order interaction | |

EIMS | 10.23 (6.15,14.30) | 6.68 (3.05,10.31) | 4.35 (0.33,8.37) |

GEMS | 8.96 (6.46,11.45) | 3.50 (1.32,5.68) | −0.24 (−3.05,2.58) |

Danish | 16.95 (11.62,22.29) | 10.85 (6.55,15.14) | 1.55 (−3.41,6.51) |

Norwegian | 17.27 (3.55,30.99) | 12.05 (0.82,23.28) | 4.07 (−7.82,15.95) |

Combined | 11.63 (9.57,13.68) | 6.74 (5.00,8.49) | 1.88 (−0.16,3.93) |

Study | SI for DR15, A2-, smoking | ||

Joint effects | 2nd & 3rd order interaction | 3rd order interaction | |

EIMS | undefined | 2.88 (1.87,4.43) | 1.74 (1.10,2.75) |

GEMS | undefined | 1.64 (1.25,2.16) | 0.97 (0.71,1.33) |

Danish | undefined | 2.78 (2.07,3.72) | 1.10 (0.81,1.49) |

Norwegian | undefined | 3.31 (1.50,7.32) | 1.31 (0.62,2.76) |

Combined | undefined | 2.38 (2.00,2.84) | 1.19 (0.99,1.44) |

The measures either quantify joint marginal and interaction effects (

In this paper we studied how a collection

Our approach makes it possible to stratify for some variables (in

The delta based confidence intervals are fast to compute. This is suitable in applications where a number of different putative risk factors are sought for. We have implemented confidence intervals based on resampling as well, using the bias-corrected accelerated percentile method [

Another object of further study is to develop odds ratio expansions of main effects and interactions when some of the risk factors are continuous [

The authors wish to thank the editors and three anonymous reviewers for helpful comments that considerably improved the structure and content of the paper. Ola Hössjer was financially supported by the Swedish Research Council, contract Nr. 621-2013-4633.

Assume without loss of generality that

The second part of Proposition

In order to prove the first part (

It will be convenient to introduce the notation

With these preliminaries, we can state the following result, which follows from (

With a slight abuse of notation, we included binary vectors