Image credit: Coursera

Measuring disease in Epidemiology

Part 1

A maybe not so short summary …

Prevalence

Prevalence is the proportion of individuals in the population who have the disease of attribute of interest at a specific time point

\[\text{Prevalence} = \frac{\text{Number of people with the disease}}{\text{Total number of individuals in the population}}\]

Prevalence is very useful in epidemiology, but it is not helpful when studying diseases with short duration and not of much help in causal inference.

Example:

During 1980 the Framingham Het Study examined 2,477 subjects for cataracts and found that 310 had them.

\[\mathsf{ \small \text{Prevalence} = \frac{310}{2477} = 0.125, \text{ or } 12.5\%. }\]

Cumulative Incidence (CI)

Cumulative Incidence is the proportion of the population with a new event during a given time period.

\[ {\textstyle \text{CI} = \frac{\text{Number of new cases during the period of interest}}{\text{Number of disease-free individuals at the start of this time period}} } \]

Cumulative incidence, as prevalence, has no units and can take values from 0 to 1, or expressed as a percentage. Cumulative incidence can be calculate, if there is a follow-up of the participants in a study. It is not possible to do so from a survey, which has no follow-up period. For cumulative incidence, the follow-up period must be the same for all participants, and no new participants can enter the study during the follow-up.

Example:

In a study of diabetics, 100 of the 189 diabetic men died durint the 13-year follow-up period. Calcluate the cumulative incidence (Risk)

\[\mathsf{ \small \text{Risk} = \frac{100}{189} \times 100 = 52.9\% \text{ during 13-years of follow-up.} }\]

Cumulative Incidence = Incidence proportion = Risk

Incidence rate (IR)

Person-time measures the time participants spend in the study

\[{\textstyle \text{IR} = \frac{\text{Number of new cases during the follow-up period}}{\text{Total person-time by disease-free individuals}}}\]

Rates can only be expressed as new cases per unit of person-time

Incidence rate is a powerful tool to describe the occurrence of a disease in the population. It can be used when cumulative incidence is problematic or cannot be properly defined. Use IR if:

  • subjects become lost to follow-up, or new subjects entering or leaving the study population
  • there are competing risks
    • For example, in a study where the outcome is a cancer diagnosis, someone could get killed in an accident before the end of the follow-up period. This individual would no longer be at risk of cancer. But we don’t know if they would have developed cancer had they not been killed in the accident.
Example:

During a six-month time period, a total of 53 nosocomial infections were recorded by an infection control nirse at a community hospital. During this time, there were 832 patients with a total of 1290 patient days. What is the rate of nosocomial infection per 100 patient days?

\[\mathsf{ \small \text{IR} = \frac{53}{1290} \times 100 = 4.1 \text{ infections per 100 patients days.} }\]

Measures of Association

For causal inference and for associations between variables we use a different set of measures, called measures of association. They can be divided in two broad categories, relative and absolute measures.

Relative measures

\[\mathsf{ \small { \text{Risk Ratio (RR)} = \frac{\text{Risk in the exposed}}{\text{Risk in the unexposed}} \\ \text{Incidence rate ratio (IRR)} = \frac{\text{Incidence rate in the exposed}}{\text{Incidence rate in the unexposed}} \\ \text{Odds Ratio (OR)} = \frac{\text{Odds in the exposed}}{\text{Odds in the unexposed}} }}\]

All ratios have no dimensions, so you only need to report the numerical value and the time point or study period.

Example for RR calculation:

Of 600 people who had high blood pressure, 35 experienced a stroke within 10 years of follow-up. Amonge 3250 peope who had low blood pressure, 40 experienced a stroke within the same follow-up period. Calculate the risk ratio of having a stroke among people with high blodd pressure compared of those with low blodd pressure.

To calculate this, we are going to use the epiR package.

# Use the epiR package
# library(epiR)

# our data
dat <- matrix(c(35, 600-35,
                40, 3250-40), 
              nrow = 2, 
              byrow = TRUE)

rownames(dat) <- c("High blood pressure (E+)", "Low blood pressure (E-)")
colnames(dat) <- c("Stroke (D+)", "No stroke (D-)") 

# method = "cohort.count"
epi.2by2(dat = dat[, 1:2], #remove the total column
         method = "cohort.count", #indicats the study design
         conf.level = 0.95, #confidence intervals
         units = 100, 
         outcome = "as.columns" #indicating how the outcome variable is represented in the 2x2 table
         )
##              Outcome +    Outcome -      Total        Inc risk *
## Exposed +           35          565        600              5.83
## Exposed -           40         3210       3250              1.23
## Total               75         3775       3850              1.95
##                  Odds
## Exposed +      0.0619
## Exposed -      0.0125
## Total          0.0199
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Inc risk ratio                               4.74 (3.04, 7.40)
## Odds ratio                                   4.97 (3.13, 7.89)
## Attrib risk *                                4.60 (2.69, 6.52)
## Attrib risk in population *                  0.72 (0.14, 1.30)
## Attrib fraction in exposed (%)               78.90 (67.07, 86.48)
## Attrib fraction in population (%)            36.82 (22.09, 48.76)
## -------------------------------------------------------------------
##  Test that odds ratio = 1: chi2(1) = 56.172 Pr>chi2 = < 0.001
##  Wald confidence limits
##  CI: confidence interval
##  * Outcomes per 100 population units

# According to the output the Risk Ratio = 4.74
Examples for IRR calculation:

A cohort study is conducted to determine whether hormone replacement therapy is associated with an increased risk of Coronary Artery Disease (CAD) in adults overthe age of 40. The study found that the frequency of CAD amongst those using hormone replacement therapy was 27 per 1,000 person-years. The study alseo founf that the frequency of CAD amongst those using hormone replacement therapy was 3 per 1,000 person-years. What is the incidence rate ratio?

# the coronary artery disease data
dat <- matrix(c(27, 1000,
                3, 1000), 
              nrow = 2, 
              byrow = TRUE)

rownames(dat) <- c("Hormone therapy (E+)", "No hormone therapy (E-)")
colnames(dat) <- c("CAD+", "Person-years") 
dat 
##                         CAD+ Person-years
## Hormone therapy (E+)      27         1000
## No hormone therapy (E-)    3         1000

# choose method = "cohort.time"
epi.2by2(dat = dat, 
         method = "cohort.time", 
         conf.level = 0.95, 
         units = 100, 
         outcome = "as.columns" 
         )
##              Outcome +    Time at risk        Inc rate *
## Exposed +           27            1000               2.7
## Exposed -            3            1000               0.3
## Total               30            2000               1.5
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Inc rate ratio                               9.00 (2.77, 46.35)
## Attrib rate *                                2.40 (1.33, 3.47)
## Attrib rate in population *                  1.20 (0.56, 1.84)
## Attrib fraction in exposed (%)               88.89 (63.89, 97.84)
## Attrib fraction in population (%)            80.00 (59.06, 93.89)
## -------------------------------------------------------------------
##  Wald confidence limits
##  CI: confidence interval
##  * Outcomes per 100 units of population time at risk

# According to the output the incidence rate ratio = 9

Example for OR calculation:

In the study1, 186 of the 263 adolescents previously judged as having experienced a suicidal behaviour requiring immediate psychiatric consultation did not exhibit suicidal behaviour (non-suicidal, NS) at six months follow-up. Of this group, 86 young people had been assessed as having depression at baseline. Of the 77 young people with persistent suicidal behaviour at follow-up (suicidal behaviour, SB), 45 had been assessed as having depression at baseline.

# Suicidal behaviour data
dat <- matrix(c(45, 86, 45+86,
                77-45, 186-86, 77-45+186-86,
                77, 186, 77+186), 
              nrow = 3, 
              byrow = TRUE)

colnames(dat) <- c("Suicidal behaviour (E+)", "Non-suicidal (E-)", "Total")
rownames(dat) <- c("Depression (D+)", "No depression (D-)", "Total") 
dat
##                    Suicidal behaviour (E+) Non-suicidal (E-) Total
## Depression (D+)                         45                86   131
## No depression (D-)                      32               100   132
## Total                                   77               186   263

# use method = "case.control"
epi.2by2(dat = dat[1:2, 1:2], #remove the unneeded rows and columns 
         method = "case.control", #indicats the study design
         conf.level = 0.95, #confidence intervals
         units = 100, 
         outcome = "as.columns" #indicating how the outcome variable is represented in the 2x2 table
         )
##              Outcome +    Outcome -      Total        Prevalence *
## Exposed +           45           86        131                34.4
## Exposed -           32          100        132                24.2
## Total               77          186        263                29.3
##                  Odds
## Exposed +       0.523
## Exposed -       0.320
## Total           0.414
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Odds ratio (W)                               1.64 (0.96, 2.80)
## Attrib prevalence *                          10.11 (-0.83, 21.04)
## Attrib prevalence in population *            5.04 (-4.11, 14.18)
## Attrib fraction (est) in exposed  (%)        38.73 (-8.22, 65.60)
## Attrib fraction (est) in population (%)      22.70 (-3.98, 42.54)
## -------------------------------------------------------------------
##  Test that odds ratio = 1: chi2(1) = 3.245 Pr>chi2 = 0.072
##  Wald confidence limits
##  CI: confidence interval
##  * Outcomes per 100 population units

# According to the output the Odds Ratio = 1.64

Absolute measures

Here we have:

  • Risk difference
  • Incidence rate difference

\[\mathsf{ \small { \text{Risk difference} = \text{Risk among the exposed} - \text{Risk among the unexposed} \\ \text{Incidence rate difference} = \text{Incidence rate among the exposed} - \text{Incidence rate among the unexposed} }}\]

For instance, RD = 0.2 means

  • there was a 0.2 or 20% excess risk in those exposed compared to those unexposed over the study period.
  • there were 20 more cases per 100 people in those exposed compared to those unexposed over the study period.

Reproducibility

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.5.3 (2019-03-11)
##  os       macOS Mojave 10.14.6        
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Europe/Stockholm            
##  date     2019-11-10                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version    date       lib source                         
##  assertthat      0.2.1      2019-03-21 [1] CRAN (R 3.5.2)                 
##  backports       1.1.5      2019-10-02 [1] CRAN (R 3.5.2)                 
##  BiasedUrn       1.07       2015-12-28 [1] CRAN (R 3.5.0)                 
##  bibtex          0.4.2      2017-06-30 [1] CRAN (R 3.5.0)                 
##  blogdown        0.16       2019-10-01 [1] CRAN (R 3.5.2)                 
##  bookdown        0.14       2019-10-01 [1] CRAN (R 3.5.2)                 
##  callr           3.3.2      2019-09-22 [1] CRAN (R 3.5.2)                 
##  cli             1.1.0      2019-03-19 [1] CRAN (R 3.5.2)                 
##  crayon          1.3.4      2017-09-16 [1] CRAN (R 3.5.0)                 
##  desc            1.2.0      2018-05-01 [1] CRAN (R 3.5.0)                 
##  devtools      * 2.2.1      2019-09-24 [1] CRAN (R 3.5.2)                 
##  digest          0.6.21     2019-09-20 [1] CRAN (R 3.5.2)                 
##  ellipsis        0.3.0      2019-09-20 [1] CRAN (R 3.5.2)                 
##  epiR          * 1.0-4      2019-08-23 [1] CRAN (R 3.5.2)                 
##  evaluate        0.14       2019-05-28 [1] CRAN (R 3.5.2)                 
##  fs              1.3.1      2019-05-06 [1] CRAN (R 3.5.2)                 
##  glue            1.3.1.9000 2019-10-12 [1] Github (tidyverse/glue@71eeddf)
##  htmltools       0.4.0      2019-10-04 [1] CRAN (R 3.5.2)                 
##  httr            1.4.1      2019-08-05 [1] CRAN (R 3.5.2)                 
##  jsonlite        1.6        2018-12-07 [1] CRAN (R 3.5.0)                 
##  knitcitations * 1.0.10     2019-09-15 [1] CRAN (R 3.5.2)                 
##  knitr           1.25       2019-09-18 [1] CRAN (R 3.5.2)                 
##  lattice         0.20-38    2018-11-04 [1] CRAN (R 3.5.3)                 
##  lubridate       1.7.4      2018-04-11 [1] CRAN (R 3.5.0)                 
##  magrittr        1.5        2014-11-22 [1] CRAN (R 3.5.0)                 
##  Matrix          1.2-17     2019-03-22 [1] CRAN (R 3.5.2)                 
##  memoise         1.1.0      2017-04-21 [1] CRAN (R 3.5.0)                 
##  pkgbuild        1.0.6      2019-10-09 [1] CRAN (R 3.5.2)                 
##  pkgload         1.0.2      2018-10-29 [1] CRAN (R 3.5.0)                 
##  plyr            1.8.4      2016-06-08 [1] CRAN (R 3.5.0)                 
##  prettyunits     1.0.2      2015-07-13 [1] CRAN (R 3.5.0)                 
##  processx        3.4.1      2019-07-18 [1] CRAN (R 3.5.2)                 
##  ps              1.3.0      2018-12-21 [1] CRAN (R 3.5.0)                 
##  R6              2.4.0      2019-02-14 [1] CRAN (R 3.5.2)                 
##  Rcpp            1.0.2      2019-07-25 [1] CRAN (R 3.5.2)                 
##  RefManageR      1.2.12     2019-04-03 [1] CRAN (R 3.5.2)                 
##  remotes         2.1.0      2019-06-24 [1] CRAN (R 3.5.2)                 
##  rlang           0.4.0      2019-06-25 [1] CRAN (R 3.5.2)                 
##  rmarkdown       1.16       2019-10-01 [1] CRAN (R 3.5.2)                 
##  rprojroot       1.3-2      2018-01-03 [1] CRAN (R 3.5.0)                 
##  sessioninfo     1.1.1      2018-11-05 [1] CRAN (R 3.5.0)                 
##  stringi         1.4.3      2019-03-12 [1] CRAN (R 3.5.2)                 
##  stringr         1.4.0      2019-02-10 [1] CRAN (R 3.5.2)                 
##  survival      * 2.44-1.1   2019-04-01 [1] CRAN (R 3.5.2)                 
##  testthat        2.2.1      2019-07-25 [1] CRAN (R 3.5.2)                 
##  usethis       * 1.5.1      2019-07-04 [1] CRAN (R 3.5.2)                 
##  withr           2.1.2      2018-03-15 [1] CRAN (R 3.5.0)                 
##  xfun            0.10       2019-10-01 [1] CRAN (R 3.5.2)                 
##  xml2            1.2.2      2019-08-09 [1] CRAN (R 3.5.2)                 
##  yaml            2.2.0      2018-07-25 [1] CRAN (R 3.5.0)                 
## 
## [1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library

References


  1. This example is taken from Explaining Odds Ratios article.

Avatar
Leyla Nunez
Statistician
comments powered by Disqus