Part I - Cox PH with Time varying Covariate

Definitions

Covariates whose values changes over time are commanly called time-varying or time-dependent covariates. Time-varying covariates can be classified as either internal or external:

  • internal covariates are related to the individuals, and can only be measured while a patient is alive.
    • blood pressure or blood glucose levels measured over time,
    • white blood cell count,
    • smoking status,
    • employment status, etc.
  • external variables are time-dependent variables that exists totally independent of any individuals or variables which changes in a known way.
    • air temperature,
    • measure of airborne pollution,
    • age of a patient,
    • dose of a drug that changes in a predetermined manner,
    • time itself

For fitting a Cox regression model the distinction between internal and external covariates is not important but those concepts are relevant when predicting survival.

Goal

In Part II, we will look at time-varying covariates when the proportional hazard assumption is not fullfilled.

Here, in Part I, we will focus on situations where the waiting time from the occurrence of some specific event until treatment may be strongly associated with the patient’s survival. As a motivating example, we will use the famous Stanford heart transplant cohort data, (jasa). The subjects were admitted to program when heart condition was sufficiently severe. A donor’s heart was sought. Some patients received a heart, and some died before a suitable heart could be found.

Question:
Did heart transplant has a beneficial effect on survival?

The data set is available in the survival package. The data concists of 103 rows and 14 columns.

Variable name Description
birth.dt birth date
accept.dt acceptance into program
tx.date transplant date
fu.date end of followup
fustat dead or alive
age age (in years)
futime followup time
wait.time time before transplant
transplant transplant indicator
mismatch mismatch score
mscore another mismatch score
## load libraries
library(survival)
library(survminer)
library(arsenal)
library(tidyverse)
library(knitr)
library(kableExtra)
library(wesanderson) # palettes for plots

Lets look at 2 patients:

id transplant wait.time futime fustat
97 1 20 130 0
62 0 NA 68 1

Time 0 is time-point when the patients have been listed for heart transplant. Timing for transplantation is varying across patients. Patient with id = 97 recieved heart transplant at day 20 and was followed until day 130. Patient with id = 62 died at day 68 without receiving heart transplant.

Data Cleaning

Before we can continue, we need to do some data cleaning.

> # Do we have patients who died on the day of entry?
> jasa %>%
+   filter(fu.date == accept.dt) 
    birth.dt  accept.dt tx.date    fu.date fustat surgery     age futime
1 1914-12-04 1968-09-27    <NA> 1968-09-27      1       1 53.8152      0
  wait.time transplant mismatch hla.a2 mscore reject id
1        NA          0       NA     NA     NA     NA 15
> 
> # Do we have patients who died on the same day as their transplant?                         
> jasa %>%
+   filter(tx.date == fu.date)
    birth.dt  accept.dt    tx.date    fu.date fustat surgery      age
1 1928-11-14 1970-05-05 1970-05-09 1970-05-09      1       0 41.47023
  futime wait.time transplant mismatch hla.a2 mscore reject id
1      4         4          1        3      0   0.87      0 38
> 
> 
> # Make adjustments for those patients  
> # Note: age and wait.time are recalculated
> jasa <- jasa %>%
+   mutate(
+     age = (accept.dt - birth.dt)/365.25 - 48, # Recalculating age-48 years => centered at 48
+     year = (accept.dt - ymd("1967-10-01"))/365.25, # time since start of the study 
+     wait.time = tx.date - accept.dt) %>%
+   mutate(
+     futime = ifelse(fu.date == accept.dt, .5, futime), #die on day 0.5
+     wait.time = ifelse(tx.date == fu.date, wait.time -.5, wait.time) # reduce time before transplant with 0.5 
+     )

Time-fixed exposure (wrong analysis)

The general Cox regression considers covariates are fixed for the entire follow-up period.

> # Use survfit() function to produce the Kaplan-Meier estimates of 
> # the probability of survival over time.
> fit <- survfit(Surv(futime, fustat) ~ transplant, data=jasa)
> print(fit)
Call: survfit(formula = Surv(futime, fustat) ~ transplant, data = jasa)

              n events median 0.95LCL 0.95UCL
transplant=0 34     30     20      11      49
transplant=1 69     45    284     152     851
> # Kaplan-Meier Curves using ggsurvplot
> ggsurvplot(fit, 
+            censor=FALSE,
+            risk.table=TRUE,
+            break.x.by=200,
+            xlab='Time since enrollment, years', 
+            ylab='Survival probability')

> # The function survdiff() can be used to compute log-rank test 
> # comparing two or more survival curves.
> survdiff(Surv(futime, fustat) ~ transplant, data=jasa)
Call:
survdiff(formula = Surv(futime, fustat) ~ transplant, data = jasa)

              N Observed Expected (O-E)^2/E (O-E)^2/V
transplant=0 34       30     12.1      26.5      33.2
transplant=1 69       45     62.9       5.1      33.2

 Chisq= 33.2  on 1 degrees of freedom, p= 8e-09 
> # Cox Models
> fit_cox <- coxph(Surv(futime, fustat) ~ transplant, data = jasa)
> summary(fit_cox)
Call:
coxph(formula = Surv(futime, fustat) ~ transplant, data = jasa)

  n= 103, number of events= 75 

              coef exp(coef) se(coef)      z Pr(>|z|)
transplant -1.3234    0.2662   0.2438 -5.428 5.69e-08

           exp(coef) exp(-coef) lower .95 upper .95
transplant    0.2662      3.756    0.1651    0.4293

Concordance= 0.668  (se = 0.025 )
Likelihood ratio test= 25.95  on 1 df,   p=4e-07
Wald test            = 29.47  on 1 df,   p=6e-08
Score (logrank) test = 33.38  on 1 df,   p=8e-09

Here we entirely ignore the timing of receiving the transplant. The result implies that receiving heart transplantation reduces mortality. The problem with the calculation is that we assume that 69 patient did have a transplant at time 0. which is a wrong assumption. 69 is the number of patients who eventually will receiev transplant.

Time immortal bias

The problem usually occurs with the passing of time before a subject initiates a given exposure.

Bad solutions:

  • The time prior to exposure is omiteed (left entry at exposure time)
  • The subject is counted as exposed before exposure occurs - as we did in the example above. In both cases, bias is toward making exposure appear to be associated with longer survival.

In our example, we artificially inflate the effect of treatment because these patients have to live longer to receive treatment, so the observed effect of treatment is not due to the effect of transplant.

Right solutions:

  • Time-dependent exposure variable - using Cox regression with time varying covariates
  • Let subject be categorized as not exposed at times before exposure occurs, and let exposure status change when exposure has occurred.

Extensions of the Cox Model

We need to divide exposed subjects’ information into two records.
tx - heart transplant status, will take value

  • tx = 1 at time \(t\) if the patient has already received a transplant at some time \(t_0\) prior to time \(t\), $t < t_0 $
  • tx = 0 at time \(t\) if the patient has not yet received a transplant by time \(t \geq t_0\)

Once a patient receives a transplant at time \(t_0\), the value of tx remains 1 for all subsequent times. In addition, the tx variable treats a tansplant patient as a non-transplant prior to receiving the transplant.

In R, tmerge function creates a data sets which have multiple intervals for each subject, along with the covariate values that apply over that interval.

> # Use the function tmerge 
> sdata <- tmerge(jasa, 
+                 jasa, 
+                 id=id,
+                 death = event(futime, fustat),
+                 tx = tdc(wait.time))
> 
> #attr(sdata, "tcount")
> # Pat with id 97 
> sdata[sdata$id == 97, c('id', 'tstart','tstop','death', 'tx')]
    id tstart tstop death tx
161 97      0    20     0  0
162 97     20   130     0  1

For instance, the patient (id = 97) has 2 rows:

  • No transplant tx = 0 between day 0 to day 20, and
  • transplant tx = 1 between day 20 and 130.

Run a Cox regression with time-varying covariates

> # K-M plot time-varying
> fit_KM <- survfit(Surv(tstart, tstop, death) ~ tx, data=sdata)
> fit_KM
Call: survfit(formula = Surv(tstart, tstop, death) ~ tx, data = sdata)

     records n.max n.start events median 0.95LCL 0.95UCL
tx=0     101   101     101     30    148      68      NA
tx=1      69    45      11     45     89      52     333
> 
> ggsurvplot(fit_KM, 
+            censor=FALSE,
+            risk.table=TRUE,
+            break.x.by=200,
+            xlab='Time since enrollment, years', 
+            ylab='Survival probability')

The model that we fit has the form

\[\small \mathsf{ h_i(t) = h_0(t) \exp\{\delta \times tx_i(t) + \beta \times age_i\} \text{, where} \\ tx_i(t) = \begin{cases} 0 & \text{if patient $i$ did not have a transplantation by time t < wait-time} \\ 1 & \text{if patient $i$ had a transplantation at some time t $\geq$ wait-time} \end{cases} } \]

The proportional hazards (PH) assumption can be checked using statistical tests and graphical diagnostics based on the scaled Schoenfeld residuals.

> # Adjusted for tx and age
> fit <- coxph(Surv(tstart, tstop, death) ~ tx + age, data=sdata)
> summary(fit)
Call:
coxph(formula = Surv(tstart, tstop, death) ~ tx + age, data = sdata)

  n= 170, number of events= 75 

         coef exp(coef)  se(coef)      z Pr(>|z|)
tx  -0.006058  0.993960  0.311750 -0.019   0.9845
age  0.030758  1.031236  0.014496  2.122   0.0339

    exp(coef) exp(-coef) lower .95 upper .95
tx      0.994     1.0061    0.5395     1.831
age     1.031     0.9697    1.0023     1.061

Concordance= 0.576  (se = 0.038 )
Likelihood ratio test= 5.17  on 2 df,   p=0.08
Wald test            = 4.63  on 2 df,   p=0.1
Score (logrank) test = 4.64  on 2 df,   p=0.1
> 
> # To test for the proportional-hazards (PH) assumption
> cox.zph(fit)
           rho chisq     p
tx     -0.0707 0.401 0.527
age     0.0933 0.894 0.344
GLOBAL      NA 1.117 0.572

We obtain the results:

  • The test for PH assumption is not statistically significant for each of the covariates, and the global test is also not statistically significant.
  • exp(\(\delta\)) = 0.994,
  • p = 0.9845 transplantation does not improve survival.

Additional adjustments

The Stanford heart transplant data has recieved a lot of attention1234.

According to Kleinbaum, the model above needs additional adjustments:

  • tissue mismatch score (tms) and age at transplant (age) are considered as prognostic indicatores of survival only for patients who received transplant.
  • age at eligibility is not considered an important prognostic factor for the non transplants.

\[\small \mathsf{ h_i(t) = h_0(t) \exp\{\delta_1 \times tx_i(t) + \delta_2 \times tms_i(t) + \delta_3 \times age_i(t) \} \text{, where} \\ tx_i(t) = \begin{cases} 0 & \text{if patient $i$ did not have a transplantation by time t < wait-time} \\ 1 & \text{if patient $i$ had a transplantation at some time t $\geq$ wait-time} \end{cases} \\ tms_i(t) = \begin{cases} 0 & \text{if t < wait-time} \\ tms & \text{if t $\geq$ wait-time} \end{cases} \\ age_i(t) = \begin{cases} 0 & \text{if t < wait-time} \\ age & \text{if t $\geq$ wait-time} \end{cases} } \]

I let you make the need adjustment to fit the suggested model.

Kalbfleisch and Prentice, described six different models. All the 6 models are shown below but I execute only the first model here. Because of several tied death times in the data set, the Breslow approximation is used.

> # the 6 models found in Kalbfleisch and Prentice, p139
> sfit.1 <- coxph(Surv(tstart, tstop, death) ~ (age + surgery)*tx,
+                 data = sdata, 
+                 method = 'breslow')
> 
> summary(sfit.1)
Call:
coxph(formula = Surv(tstart, tstop, death) ~ (age + surgery) * 
    tx, data = sdata, method = "breslow")

  n= 170, number of events= 75 

               coef exp(coef) se(coef)      z Pr(>|z|)
age         0.01386   1.01395  0.01813  0.765    0.445
surgery    -0.54652   0.57896  0.61091 -0.895    0.371
tx          0.11572   1.12268  0.32729  0.354    0.724
age:tx      0.03473   1.03534  0.02725  1.274    0.202
surgery:tx -0.29037   0.74799  0.75819 -0.383    0.702

           exp(coef) exp(-coef) lower .95 upper .95
age            1.014     0.9862    0.9786     1.051
surgery        0.579     1.7272    0.1748     1.917
tx             1.123     0.8907    0.5911     2.132
age:tx         1.035     0.9659    0.9815     1.092
surgery:tx     0.748     1.3369    0.1692     3.306

Concordance= 0.595  (se = 0.037 )
Likelihood ratio test= 12.45  on 5 df,   p=0.03
Wald test            = 11.62  on 5 df,   p=0.04
Score (logrank) test = 12.02  on 5 df,   p=0.03
> 
> # To test for the proportional-hazards (PH) assumption
> cox.zph(sfit.1)
               rho  chisq     p
age        -0.0598 0.3075 0.579
surgery    -0.0832 0.5225 0.470
tx         -0.0325 0.0859 0.769
age:tx      0.1520 1.9000 0.168
surgery:tx  0.1672 2.0125 0.156
GLOBAL          NA 4.4230 0.490
> # 5 additional models - No run!
> sfit.2 <- coxph(Surv(tstart, tstop, death) ~ year*tx,
+                 data = sdata, 
+                 method = 'breslow')
> sfit.3 <- coxph(Surv(tstart, tstop, death) ~ (age + year)*tx,
+                 data = sdata, 
+                 method = 'breslow')
> sfit.4 <- coxph(Surv(tstart, tstop, death) ~ (year + surgery)*tx,
+                 data = sdata, 
+                 method = 'breslow')
> sfit.5 <- coxph(Surv(tstart, tstop, death) ~ (age + surgery)*tx + year ,
+                 data = sdata, 
+                 method = 'breslow')
> sfit.6 <- coxph(Surv(tstart, tstop, death) ~ age*tx + surgery + year,
+                 data = sdata, 
+                 method = 'breslow')

Examples of Time immortal bias in the literature

Van Walraven et al5. conducted a literature review of nine prominent medical journals between 1998 and 2002. He found that 41% (32% - 50%, 95%CI) of reported observational studies using survival analyses were susceptible to immortal bias.

Reproducibility

─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.3 (2019-03-11)
 os       macOS Mojave 10.14.6        
 system   x86_64, darwin15.6.0        
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Stockholm            
 date     2019-11-10                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version    date       lib source                         
 arsenal     * 3.3.0      2019-09-07 [1] CRAN (R 3.5.2)                 
 assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.5.2)                 
 backports     1.1.5      2019-10-02 [1] CRAN (R 3.5.2)                 
 blogdown      0.16       2019-10-01 [1] CRAN (R 3.5.2)                 
 bookdown      0.14       2019-10-01 [1] CRAN (R 3.5.2)                 
 broom         0.5.2      2019-04-07 [1] CRAN (R 3.5.2)                 
 callr         3.3.2      2019-09-22 [1] CRAN (R 3.5.2)                 
 cellranger    1.1.0      2016-07-27 [1] CRAN (R 3.5.0)                 
 cli           1.1.0      2019-03-19 [1] CRAN (R 3.5.2)                 
 colorspace    1.4-1      2019-03-18 [1] CRAN (R 3.5.2)                 
 crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.0)                 
 data.table    1.12.2     2019-04-07 [1] CRAN (R 3.5.2)                 
 desc          1.2.0      2018-05-01 [1] CRAN (R 3.5.0)                 
 devtools    * 2.2.1      2019-09-24 [1] CRAN (R 3.5.2)                 
 digest        0.6.21     2019-09-20 [1] CRAN (R 3.5.2)                 
 dplyr       * 0.8.3      2019-07-04 [1] CRAN (R 3.5.2)                 
 ellipsis      0.3.0      2019-09-20 [1] CRAN (R 3.5.2)                 
 evaluate      0.14       2019-05-28 [1] CRAN (R 3.5.2)                 
 fansi         0.4.0      2018-10-05 [1] CRAN (R 3.5.0)                 
 forcats     * 0.4.0      2019-02-17 [1] CRAN (R 3.5.2)                 
 fs            1.3.1      2019-05-06 [1] CRAN (R 3.5.2)                 
 generics      0.0.2      2018-11-29 [1] CRAN (R 3.5.0)                 
 ggplot2     * 3.2.1      2019-08-10 [1] CRAN (R 3.5.2)                 
 ggpubr      * 0.2.3      2019-09-03 [1] CRAN (R 3.5.2)                 
 ggsignif      0.6.0      2019-08-08 [1] CRAN (R 3.5.2)                 
 glue          1.3.1.9000 2019-10-12 [1] Github (tidyverse/glue@71eeddf)
 gridExtra     2.3        2017-09-09 [1] CRAN (R 3.5.0)                 
 gtable        0.3.0      2019-03-25 [1] CRAN (R 3.5.2)                 
 haven         2.1.1      2019-07-04 [1] CRAN (R 3.5.2)                 
 highr         0.8        2019-03-20 [1] CRAN (R 3.5.2)                 
 hms           0.5.1      2019-08-23 [1] CRAN (R 3.5.2)                 
 htmltools     0.4.0      2019-10-04 [1] CRAN (R 3.5.2)                 
 httr          1.4.1      2019-08-05 [1] CRAN (R 3.5.2)                 
 jsonlite      1.6        2018-12-07 [1] CRAN (R 3.5.0)                 
 kableExtra  * 1.1.0      2019-03-16 [1] CRAN (R 3.5.2)                 
 km.ci         0.5-2      2009-08-30 [1] CRAN (R 3.5.0)                 
 KMsurv        0.1-5      2012-12-03 [1] CRAN (R 3.5.0)                 
 knitr       * 1.25       2019-09-18 [1] CRAN (R 3.5.2)                 
 labeling      0.3        2014-08-23 [1] CRAN (R 3.5.0)                 
 lattice       0.20-38    2018-11-04 [1] CRAN (R 3.5.3)                 
 lazyeval      0.2.2      2019-03-15 [1] CRAN (R 3.5.2)                 
 lifecycle     0.1.0      2019-08-01 [1] CRAN (R 3.5.2)                 
 lubridate   * 1.7.4      2018-04-11 [1] CRAN (R 3.5.0)                 
 magrittr    * 1.5        2014-11-22 [1] CRAN (R 3.5.0)                 
 Matrix        1.2-17     2019-03-22 [1] CRAN (R 3.5.2)                 
 memoise       1.1.0      2017-04-21 [1] CRAN (R 3.5.0)                 
 modelr        0.1.5      2019-08-08 [1] CRAN (R 3.5.2)                 
 munsell       0.5.0      2018-06-12 [1] CRAN (R 3.5.0)                 
 nlme          3.1-141    2019-08-01 [1] CRAN (R 3.5.2)                 
 pillar        1.4.2      2019-06-29 [1] CRAN (R 3.5.2)                 
 pkgbuild      1.0.6      2019-10-09 [1] CRAN (R 3.5.2)                 
 pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 3.5.2)                 
 pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.5.0)                 
 prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.5.0)                 
 processx      3.4.1      2019-07-18 [1] CRAN (R 3.5.2)                 
 ps            1.3.0      2018-12-21 [1] CRAN (R 3.5.0)                 
 purrr       * 0.3.2      2019-03-15 [1] CRAN (R 3.5.2)                 
 R6            2.4.0      2019-02-14 [1] CRAN (R 3.5.2)                 
 Rcpp          1.0.2      2019-07-25 [1] CRAN (R 3.5.2)                 
 readr       * 1.3.1      2018-12-21 [1] CRAN (R 3.5.0)                 
 readxl        1.3.1      2019-03-13 [1] CRAN (R 3.5.2)                 
 remotes       2.1.0      2019-06-24 [1] CRAN (R 3.5.2)                 
 rlang         0.4.0      2019-06-25 [1] CRAN (R 3.5.2)                 
 rmarkdown     1.16       2019-10-01 [1] CRAN (R 3.5.2)                 
 rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.5.0)                 
 rstudioapi    0.10       2019-03-19 [1] CRAN (R 3.5.2)                 
 rvest         0.3.4      2019-05-15 [1] CRAN (R 3.5.2)                 
 scales        1.0.0      2018-08-09 [1] CRAN (R 3.5.0)                 
 sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.5.0)                 
 stringi       1.4.3      2019-03-12 [1] CRAN (R 3.5.2)                 
 stringr     * 1.4.0      2019-02-10 [1] CRAN (R 3.5.2)                 
 survival    * 2.44-1.1   2019-04-01 [1] CRAN (R 3.5.2)                 
 survminer   * 0.4.6      2019-09-03 [1] CRAN (R 3.5.2)                 
 survMisc      0.5.5      2018-07-05 [1] CRAN (R 3.5.0)                 
 testthat      2.2.1      2019-07-25 [1] CRAN (R 3.5.2)                 
 tibble      * 2.1.3      2019-06-06 [1] CRAN (R 3.5.2)                 
 tidyr       * 1.0.0      2019-09-11 [1] CRAN (R 3.5.2)                 
 tidyselect    0.2.5      2018-10-11 [1] CRAN (R 3.5.0)                 
 tidyverse   * 1.2.1      2017-11-14 [1] CRAN (R 3.5.0)                 
 usethis     * 1.5.1      2019-07-04 [1] CRAN (R 3.5.2)                 
 utf8          1.1.4      2018-05-24 [1] CRAN (R 3.5.0)                 
 vctrs         0.2.0      2019-07-05 [1] CRAN (R 3.5.2)                 
 viridisLite   0.3.0      2018-02-01 [1] CRAN (R 3.5.0)                 
 webshot       0.5.1      2018-09-28 [1] CRAN (R 3.5.0)                 
 wesanderson * 0.3.6      2018-04-20 [1] CRAN (R 3.5.0)                 
 withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.0)                 
 xfun          0.10       2019-10-01 [1] CRAN (R 3.5.2)                 
 xml2          1.2.2      2019-08-09 [1] CRAN (R 3.5.2)                 
 xtable        1.8-4      2019-04-21 [1] CRAN (R 3.5.2)                 
 yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.0)                 
 zeallot       0.1.0      2018-01-28 [1] CRAN (R 3.5.0)                 
 zoo           1.8-6      2019-05-28 [1] CRAN (R 3.5.2)                 

[1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library

References


  1. D.G. Kleinbaum and M. Klein. Survival analysis: A self-learning text. (3rd edition), Springer, New York, 2012

  2. J. D. Kalbfleisch and R. L. Prentice. The Statistical Analysis of Failure Time Data. Wiley, New York, 1980

  3. T. M. Therneau and P. M. Grambsch. Modeling Survival Data: Extending the Cox Model. Springer, New York, 2000

  4. Aitkin, Murray, et al. A Reanalysis of the Stanford Heart Transplant Data. Journal of the American Statistical Association, 1983;78(382): 264-274.

  5. Walraven C van, Davis D, Forster AJ, et al. Time-dependent bias was common in survival analyses published in leading clinical journals. Journal of Clinical Epidemiology 2004;57:672–82.

Avatar
Leyla Nunez
Statistician

Related

comments powered by Disqus