### Definitions

Covariates whose values changes over time are commanly called time-varying or time-dependent covariates. Time-varying covariates can be classified as either internal or external:

- internal covariates are related to the individuals, and can only be measured while a patient is alive.
- blood pressure or blood glucose levels measured over time,
- white blood cell count,
- smoking status,
- employment status, etc.

- external variables are time-dependent variables that exists totally independent of any individuals or variables which changes in a known way.
- air temperature,
- measure of airborne pollution,
- age of a patient,
- dose of a drug that changes in a predetermined manner,
- time itself

For fitting a Cox regression model the distinction between internal and external covariates is not important but those concepts are relevant when predicting survival.

### Goal

In Part II, we will look at time-varying covariates when the proportional hazard assumption is not fullfilled.

Here, in Part I, we will focus on situations where the waiting time from the occurrence of some specific event until treatment may be strongly associated with the patient’s survival. As a motivating example, we will use the famous Stanford heart transplant cohort data, (`jasa`

). The subjects were admitted to program when heart condition was sufficiently severe. A donor’s heart was sought. Some patients received a heart, and some died before a suitable heart could be found.

*Question*:

Did heart transplant has a beneficial effect on survival?

The data set is available in the `survival`

package. The data concists of 103 rows and 14 columns.

Variable name | Description |
---|---|

birth.dt | birth date |

accept.dt | acceptance into program |

tx.date | transplant date |

fu.date | end of followup |

fustat | dead or alive |

age | age (in years) |

futime | followup time |

wait.time | time before transplant |

transplant | transplant indicator |

mismatch | mismatch score |

mscore | another mismatch score |

```
## load libraries
library(survival)
library(survminer)
library(arsenal)
library(tidyverse)
library(knitr)
library(kableExtra)
library(wesanderson) # palettes for plots
```

Lets look at 2 patients:

id | transplant | wait.time | futime | fustat |
---|---|---|---|---|

97 | 1 | 20 | 130 | 0 |

62 | 0 | NA | 68 | 1 |

Time `0`

is time-point when the patients have been listed for heart transplant. Timing for transplantation is varying across patients.
Patient with `id = 97`

recieved heart transplant at day `20`

and was followed until day `130`

. Patient with `id = 62`

died at day `68`

without receiving heart transplant.

#### Data Cleaning

Before we can continue, we need to do some data cleaning.

```
> # Do we have patients who died on the day of entry?
> jasa %>%
+ filter(fu.date == accept.dt)
birth.dt accept.dt tx.date fu.date fustat surgery age futime
1 1914-12-04 1968-09-27 <NA> 1968-09-27 1 1 53.8152 0
wait.time transplant mismatch hla.a2 mscore reject id
1 NA 0 NA NA NA NA 15
>
> # Do we have patients who died on the same day as their transplant?
> jasa %>%
+ filter(tx.date == fu.date)
birth.dt accept.dt tx.date fu.date fustat surgery age
1 1928-11-14 1970-05-05 1970-05-09 1970-05-09 1 0 41.47023
futime wait.time transplant mismatch hla.a2 mscore reject id
1 4 4 1 3 0 0.87 0 38
>
>
> # Make adjustments for those patients
> # Note: age and wait.time are recalculated
> jasa <- jasa %>%
+ mutate(
+ age = (accept.dt - birth.dt)/365.25 - 48, # Recalculating age-48 years => centered at 48
+ year = (accept.dt - ymd("1967-10-01"))/365.25, # time since start of the study
+ wait.time = tx.date - accept.dt) %>%
+ mutate(
+ futime = ifelse(fu.date == accept.dt, .5, futime), #die on day 0.5
+ wait.time = ifelse(tx.date == fu.date, wait.time -.5, wait.time) # reduce time before transplant with 0.5
+ )
```

### Time-fixed exposure (wrong analysis)

The general Cox regression considers covariates are fixed for the entire follow-up period.

```
> # Use survfit() function to produce the Kaplan-Meier estimates of
> # the probability of survival over time.
> fit <- survfit(Surv(futime, fustat) ~ transplant, data=jasa)
> print(fit)
Call: survfit(formula = Surv(futime, fustat) ~ transplant, data = jasa)
n events median 0.95LCL 0.95UCL
transplant=0 34 30 20 11 49
transplant=1 69 45 284 152 851
```

```
> # Kaplan-Meier Curves using ggsurvplot
> ggsurvplot(fit,
+ censor=FALSE,
+ risk.table=TRUE,
+ break.x.by=200,
+ xlab='Time since enrollment, years',
+ ylab='Survival probability')
```

```
> # The function survdiff() can be used to compute log-rank test
> # comparing two or more survival curves.
> survdiff(Surv(futime, fustat) ~ transplant, data=jasa)
Call:
survdiff(formula = Surv(futime, fustat) ~ transplant, data = jasa)
N Observed Expected (O-E)^2/E (O-E)^2/V
transplant=0 34 30 12.1 26.5 33.2
transplant=1 69 45 62.9 5.1 33.2
Chisq= 33.2 on 1 degrees of freedom, p= 8e-09
```

```
> # Cox Models
> fit_cox <- coxph(Surv(futime, fustat) ~ transplant, data = jasa)
> summary(fit_cox)
Call:
coxph(formula = Surv(futime, fustat) ~ transplant, data = jasa)
n= 103, number of events= 75
coef exp(coef) se(coef) z Pr(>|z|)
transplant -1.3234 0.2662 0.2438 -5.428 5.69e-08
exp(coef) exp(-coef) lower .95 upper .95
transplant 0.2662 3.756 0.1651 0.4293
Concordance= 0.668 (se = 0.025 )
Likelihood ratio test= 25.95 on 1 df, p=4e-07
Wald test = 29.47 on 1 df, p=6e-08
Score (logrank) test = 33.38 on 1 df, p=8e-09
```

Here we entirely ignore the timing of receiving the transplant. The result implies that receiving heart transplantation reduces mortality. The problem with the calculation is that we assume that `69`

patient did have a transplant at time `0`

. which is a wrong assumption. `69`

is the number of patients who eventually will receiev transplant.

### Time immortal bias

The problem usually occurs with the passing of time before a subject initiates a given exposure.

Bad solutions:

- The time prior to exposure is omiteed (left entry at exposure time)
- The subject is counted as exposed before exposure occurs - as we did in the example above. In both cases, bias is toward making exposure appear to be associated with longer survival.

In our example, we artificially inflate the effect of treatment because these patients have to live longer to receive treatment, so the observed effect of treatment is not due to the effect of transplant.

Right solutions:

- Time-dependent exposure variable - using Cox regression with time varying covariates
- Let subject be categorized as not exposed at times before exposure occurs, and let exposure status change when exposure has occurred.

### Extensions of the Cox Model

We need to divide exposed subjects’ information into two records.

`tx`

- heart transplant status, will take value

`tx = 1`

at time \(t\) if the patient has already received a transplant at some time \(t_0\) prior to time \(t\), $t < t_0 $`tx = 0`

at time \(t\) if the patient has not yet received a transplant by time \(t \geq t_0\)

Once a patient receives a transplant at time \(t_0\), the value of `tx`

remains 1 for all subsequent times. In addition, the `tx`

variable treats a tansplant patient as a non-transplant prior to receiving the transplant.

In R, `tmerge`

function creates a data sets which have multiple intervals for each subject, along with the covariate values that apply over that interval.

```
> # Use the function tmerge
> sdata <- tmerge(jasa,
+ jasa,
+ id=id,
+ death = event(futime, fustat),
+ tx = tdc(wait.time))
>
> #attr(sdata, "tcount")
```

```
> # Pat with id 97
> sdata[sdata$id == 97, c('id', 'tstart','tstop','death', 'tx')]
id tstart tstop death tx
161 97 0 20 0 0
162 97 20 130 0 1
```

For instance, the patient (`id = 97`

) has 2 rows:

- No transplant
`tx = 0`

between day`0`

to day`20`

, and - transplant
`tx = 1`

between day`20`

and`130`

.

#### Run a Cox regression with time-varying covariates

```
> # K-M plot time-varying
> fit_KM <- survfit(Surv(tstart, tstop, death) ~ tx, data=sdata)
> fit_KM
Call: survfit(formula = Surv(tstart, tstop, death) ~ tx, data = sdata)
records n.max n.start events median 0.95LCL 0.95UCL
tx=0 101 101 101 30 148 68 NA
tx=1 69 45 11 45 89 52 333
>
> ggsurvplot(fit_KM,
+ censor=FALSE,
+ risk.table=TRUE,
+ break.x.by=200,
+ xlab='Time since enrollment, years',
+ ylab='Survival probability')
```

The model that we fit has the form

\[\small \mathsf{ h_i(t) = h_0(t) \exp\{\delta \times tx_i(t) + \beta \times age_i\} \text{, where} \\ tx_i(t) = \begin{cases} 0 & \text{if patient $i$ did not have a transplantation by time t < wait-time} \\ 1 & \text{if patient $i$ had a transplantation at some time t $\geq$ wait-time} \end{cases} } \]

The proportional hazards (PH) assumption can be checked using statistical tests and graphical diagnostics based on the *scaled Schoenfeld residuals*.

```
> # Adjusted for tx and age
> fit <- coxph(Surv(tstart, tstop, death) ~ tx + age, data=sdata)
> summary(fit)
Call:
coxph(formula = Surv(tstart, tstop, death) ~ tx + age, data = sdata)
n= 170, number of events= 75
coef exp(coef) se(coef) z Pr(>|z|)
tx -0.006058 0.993960 0.311750 -0.019 0.9845
age 0.030758 1.031236 0.014496 2.122 0.0339
exp(coef) exp(-coef) lower .95 upper .95
tx 0.994 1.0061 0.5395 1.831
age 1.031 0.9697 1.0023 1.061
Concordance= 0.576 (se = 0.038 )
Likelihood ratio test= 5.17 on 2 df, p=0.08
Wald test = 4.63 on 2 df, p=0.1
Score (logrank) test = 4.64 on 2 df, p=0.1
>
> # To test for the proportional-hazards (PH) assumption
> cox.zph(fit)
rho chisq p
tx -0.0707 0.401 0.527
age 0.0933 0.894 0.344
GLOBAL NA 1.117 0.572
```

We obtain the results:

- The test for PH assumption is not statistically significant for each of the covariates, and the global test is also not statistically significant.
- exp(\(\delta\)) = 0.994,

*p*= 0.9845 transplantation does not improve survival.

#### Additional adjustments

The Stanford heart transplant data has recieved a lot of attention^{1}^{2}^{3}^{4}.

According to Kleinbaum, the model above needs additional adjustments:

*tissue mismatch score*(tms) and*age at transplant*(age) are considered as prognostic indicatores of survival**only**for patients who received transplant.*age at eligibility*is**not**considered an important prognostic factor for the non transplants.

\[\small \mathsf{ h_i(t) = h_0(t) \exp\{\delta_1 \times tx_i(t) + \delta_2 \times tms_i(t) + \delta_3 \times age_i(t) \} \text{, where} \\ tx_i(t) = \begin{cases} 0 & \text{if patient $i$ did not have a transplantation by time t < wait-time} \\ 1 & \text{if patient $i$ had a transplantation at some time t $\geq$ wait-time} \end{cases} \\ tms_i(t) = \begin{cases} 0 & \text{if t < wait-time} \\ tms & \text{if t $\geq$ wait-time} \end{cases} \\ age_i(t) = \begin{cases} 0 & \text{if t < wait-time} \\ age & \text{if t $\geq$ wait-time} \end{cases} } \]

I let you make the need adjustment to fit the suggested model.

Kalbfleisch and Prentice, described six different models. All the 6 models are shown below but I execute only the first model here. Because of several tied death times in the data set, the Breslow approximation is used.

```
> # the 6 models found in Kalbfleisch and Prentice, p139
> sfit.1 <- coxph(Surv(tstart, tstop, death) ~ (age + surgery)*tx,
+ data = sdata,
+ method = 'breslow')
>
> summary(sfit.1)
Call:
coxph(formula = Surv(tstart, tstop, death) ~ (age + surgery) *
tx, data = sdata, method = "breslow")
n= 170, number of events= 75
coef exp(coef) se(coef) z Pr(>|z|)
age 0.01386 1.01395 0.01813 0.765 0.445
surgery -0.54652 0.57896 0.61091 -0.895 0.371
tx 0.11572 1.12268 0.32729 0.354 0.724
age:tx 0.03473 1.03534 0.02725 1.274 0.202
surgery:tx -0.29037 0.74799 0.75819 -0.383 0.702
exp(coef) exp(-coef) lower .95 upper .95
age 1.014 0.9862 0.9786 1.051
surgery 0.579 1.7272 0.1748 1.917
tx 1.123 0.8907 0.5911 2.132
age:tx 1.035 0.9659 0.9815 1.092
surgery:tx 0.748 1.3369 0.1692 3.306
Concordance= 0.595 (se = 0.037 )
Likelihood ratio test= 12.45 on 5 df, p=0.03
Wald test = 11.62 on 5 df, p=0.04
Score (logrank) test = 12.02 on 5 df, p=0.03
>
> # To test for the proportional-hazards (PH) assumption
> cox.zph(sfit.1)
rho chisq p
age -0.0598 0.3075 0.579
surgery -0.0832 0.5225 0.470
tx -0.0325 0.0859 0.769
age:tx 0.1520 1.9000 0.168
surgery:tx 0.1672 2.0125 0.156
GLOBAL NA 4.4230 0.490
```

```
> # 5 additional models - No run!
> sfit.2 <- coxph(Surv(tstart, tstop, death) ~ year*tx,
+ data = sdata,
+ method = 'breslow')
> sfit.3 <- coxph(Surv(tstart, tstop, death) ~ (age + year)*tx,
+ data = sdata,
+ method = 'breslow')
> sfit.4 <- coxph(Surv(tstart, tstop, death) ~ (year + surgery)*tx,
+ data = sdata,
+ method = 'breslow')
> sfit.5 <- coxph(Surv(tstart, tstop, death) ~ (age + surgery)*tx + year ,
+ data = sdata,
+ method = 'breslow')
> sfit.6 <- coxph(Surv(tstart, tstop, death) ~ age*tx + surgery + year,
+ data = sdata,
+ method = 'breslow')
```

#### Examples of Time immortal bias in the literature

Van Walraven et al^{5}. conducted a literature review of nine prominent medical journals between 1998 and 2002.
He found that 41% (32% - 50%, 95%CI) of reported observational studies using survival analyses were susceptible to immortal bias.

#### Reproducibility

```
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.5.3 (2019-03-11)
os macOS Mojave 10.14.6
system x86_64, darwin15.6.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Stockholm
date 2019-11-10
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
arsenal * 3.3.0 2019-09-07 [1] CRAN (R 3.5.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.2)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.5.2)
blogdown 0.16 2019-10-01 [1] CRAN (R 3.5.2)
bookdown 0.14 2019-10-01 [1] CRAN (R 3.5.2)
broom 0.5.2 2019-04-07 [1] CRAN (R 3.5.2)
callr 3.3.2 2019-09-22 [1] CRAN (R 3.5.2)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.5.0)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0)
data.table 1.12.2 2019-04-07 [1] CRAN (R 3.5.2)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.0)
devtools * 2.2.1 2019-09-24 [1] CRAN (R 3.5.2)
digest 0.6.21 2019-09-20 [1] CRAN (R 3.5.2)
dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.5.2)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.5.2)
evaluate 0.14 2019-05-28 [1] CRAN (R 3.5.2)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.5.0)
forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.5.2)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.5.2)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.5.0)
ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.5.2)
ggpubr * 0.2.3 2019-09-03 [1] CRAN (R 3.5.2)
ggsignif 0.6.0 2019-08-08 [1] CRAN (R 3.5.2)
glue 1.3.1.9000 2019-10-12 [1] Github (tidyverse/glue@71eeddf)
gridExtra 2.3 2017-09-09 [1] CRAN (R 3.5.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.5.2)
haven 2.1.1 2019-07-04 [1] CRAN (R 3.5.2)
highr 0.8 2019-03-20 [1] CRAN (R 3.5.2)
hms 0.5.1 2019-08-23 [1] CRAN (R 3.5.2)
htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.5.2)
httr 1.4.1 2019-08-05 [1] CRAN (R 3.5.2)
jsonlite 1.6 2018-12-07 [1] CRAN (R 3.5.0)
kableExtra * 1.1.0 2019-03-16 [1] CRAN (R 3.5.2)
km.ci 0.5-2 2009-08-30 [1] CRAN (R 3.5.0)
KMsurv 0.1-5 2012-12-03 [1] CRAN (R 3.5.0)
knitr * 1.25 2019-09-18 [1] CRAN (R 3.5.2)
labeling 0.3 2014-08-23 [1] CRAN (R 3.5.0)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.3)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.5.2)
lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.5.2)
lubridate * 1.7.4 2018-04-11 [1] CRAN (R 3.5.0)
magrittr * 1.5 2014-11-22 [1] CRAN (R 3.5.0)
Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.5.2)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.0)
modelr 0.1.5 2019-08-08 [1] CRAN (R 3.5.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.0)
nlme 3.1-141 2019-08-01 [1] CRAN (R 3.5.2)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.5.2)
pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.5.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.5.2)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0)
processx 3.4.1 2019-07-18 [1] CRAN (R 3.5.2)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.0)
purrr * 0.3.2 2019-03-15 [1] CRAN (R 3.5.2)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2)
Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.5.2)
readr * 1.3.1 2018-12-21 [1] CRAN (R 3.5.0)
readxl 1.3.1 2019-03-13 [1] CRAN (R 3.5.2)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.5.2)
rlang 0.4.0 2019-06-25 [1] CRAN (R 3.5.2)
rmarkdown 1.16 2019-10-01 [1] CRAN (R 3.5.2)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.2)
rvest 0.3.4 2019-05-15 [1] CRAN (R 3.5.2)
scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.0)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.2)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.5.2)
survival * 2.44-1.1 2019-04-01 [1] CRAN (R 3.5.2)
survminer * 0.4.6 2019-09-03 [1] CRAN (R 3.5.2)
survMisc 0.5.5 2018-07-05 [1] CRAN (R 3.5.0)
testthat 2.2.1 2019-07-25 [1] CRAN (R 3.5.2)
tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.5.2)
tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.5.2)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.0)
tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.5.0)
usethis * 1.5.1 2019-07-04 [1] CRAN (R 3.5.2)
utf8 1.1.4 2018-05-24 [1] CRAN (R 3.5.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.5.2)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.5.0)
webshot 0.5.1 2018-09-28 [1] CRAN (R 3.5.0)
wesanderson * 0.3.6 2018-04-20 [1] CRAN (R 3.5.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0)
xfun 0.10 2019-10-01 [1] CRAN (R 3.5.2)
xml2 1.2.2 2019-08-09 [1] CRAN (R 3.5.2)
xtable 1.8-4 2019-04-21 [1] CRAN (R 3.5.2)
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.0)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.5.0)
zoo 1.8-6 2019-05-28 [1] CRAN (R 3.5.2)
[1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library
```

#### References

D.G. Kleinbaum and M. Klein.

*Survival analysis: A self-learning text*. (3rd edition), Springer, New York, 2012↩J. D. Kalbfleisch and R. L. Prentice.

*The Statistical Analysis of Failure Time Data*. Wiley, New York, 1980↩T. M. Therneau and P. M. Grambsch.

*Modeling Survival Data: Extending the Cox Model*. Springer, New York, 2000↩Aitkin, Murray,

*et al*. A Reanalysis of the Stanford Heart Transplant Data.*Journal of the American Statistical Association*, 1983;**78**(382): 264-274.↩Walraven C van, Davis D, Forster AJ,

*et al*. Time-dependent bias was common in survival analyses published in leading clinical journals.*Journal of Clinical Epidemiology*2004;**57**:672–82.↩