Considerations for the planning, conduct and reporting of clinical trials with interim analyses: code accompanying the paper

Author

Kaspar Rufibach, on behalf of the author team

Published

June 16, 2025

1 Background

Code for the paper Clinical trials with interim analyses: Standardizing Terminology to increase clarity, Asikanius et al. (2025+), written by an author team of statisticians from regulators, academia, and industry.

2 Purpose of this document

This R Quarto file provides easy accessible code to compute all the quantities in the paper. The github repository where this document is hosted is available here.

3 Setup

Load rpact Wassmer and Pahlke (2024):

# --------------------------------------------------------------
# packages
# --------------------------------------------------------------
library(rpact)

Installation qualification for rpact 4.2.0 has not yet been performed.

Please run testPackage() before using the package in GxP relevant environments.

packageVersion("rpact")

[1] '4.2.0'

# computations below are done with rpact Version >= 4.0.0

4 Specify hypothetical trial

alpha <- 0.05
beta <- 0.2

# median OS
m1 <- 6 * 12
m2 <- 8 * 12
hr <- m1 / m2

# accrual and dropout
accrualTime <- c(0, 12)
accrualIntensity <- 100
dropoutRate1 <- 0.025
dropoutRate2 <- 0.025
dropoutTime <- 12
maxNumberOfSubjects <- accrualIntensity * max(accrualTime)

First, compute number of required events without interim:

nevent <- getSampleSizeSurvival(lambda1 = getLambdaByMedian(m2), 
                                lambda2 = getLambdaByMedian(m1), 
                                sided = 2, alpha = alpha, beta = beta)
mdd_no_interim <- nevent$criticalValuesEffectScaleLower
nevent <- ceiling(nevent$maxNumberOfEvents)

nevent

[1] 380

mdd_no_interim

       [,1]
[1,] 0.8177

These are the total number of events needed without interim and the corresponding minimal detectable difference (MDD).

5 Add interim analyses

We start with adding an interim using an O’Brien-Fleming group-sequential boundary. For simplicity, we also add an efficacy interim using OBF at the first interim. This would strictly speaking not be needed but does not make a numerical difference. One could properly design the first interim as futility only through a hand-knitted alpha-spending function in rpact.

info <- c(1 / 3, 2 / 3, 1)
designEff <- getDesignGroupSequential(informationRates = info,
                                      typeOfDesign = "asOF", sided = 1, 
                                      alpha = alpha / 2)

samplesizeEFF <- getSampleSizeSurvival(design = designEff,
                                       lambda2 = log(2) / m1, hazardRatio = hr,
                                       dropoutRate1 = dropoutRate1, 
                                       dropoutRate2 = dropoutRate2,
                                       dropoutTime = dropoutTime,
                                       accrualTime = accrualTime, 
                                       accrualIntensity = accrualIntensity)

# number of events
nevents <- ceiling(as.vector(samplesizeEFF$eventsPerStage))

# add exploratory updated analysis
nevents <- c(nevents, 500)
nevents

[1] 129 257 385 500

# Local significance levels (two-sided)
alphas <- designEff$stageLevels * 2
alphas

[1] 0.0002070114 0.0120243988 0.0462562478

# MDD 
hrMDD <- as.vector(samplesizeEFF$criticalValuesEffectScale)
hrMDD

[1] 0.5190684 0.7306326 0.8159843

# analysis timepoints
time <- as.vector(round(samplesizeEFF$analysisTime, 1))

# predict analysis time of updated analysis
grid <- 76.3965
probEvent <- getEventProbabilities(time = grid, lambda1 = getLambdaByMedian(m2), 
                                   lambda2 = getLambdaByMedian(m1),
                                   dropoutRate1 = dropoutRate1, 
                                   dropoutRate2 = dropoutRate2, 
                                   dropoutTime = dropoutTime,
                                   accrualTime = accrualTime, 
                                   accrualIntensity = accrualIntensity, 
                                   maxNumberOfSubjects = maxNumberOfSubjects)
expEvent <- probEvent$overallEventProbabilities * maxNumberOfSubjects  
cbind(grid, expEvent)

        grid expEvent
[1,] 76.3965 500.0003

time <- c(time, grid)

# power loss through the interims
# Futility HR and information at interim
futilityHR <- c(1, 0.9)
eventsInterim <- nevents[1:2]

# Calculate Z-score corresponding to futilityHR 
# Note: Minus sign is added to be consistent with directionUpper = FALSE 
# in getPowerSurvival call below.
allocationRatioPlanned <- 1

# approximate info according to Schoenfeld's formula
infoInterim <- eventsInterim * allocationRatioPlanned/(1 + allocationRatioPlanned) ^ 2 
ZscoreInterim <- -log(futilityHR) / sqrt(1 / infoInterim)

designFutility <- getDesignGroupSequential(sided = 1, alpha = alpha / 2,
                                           informationRates = info,
                                           typeOfDesign = "noEarlyEfficacy",
                                           futilityBounds = ZscoreInterim,
                                           bindingFutility = FALSE)

# Calculate power and timeline under H1 with original sample size if design includes 
# the non-binding futility (and the futility boundary is adhered to)
power <- getPowerSurvival(designFutility,
                          allocationRatioPlanned = allocationRatioPlanned,
                          lambda2 = log(2) / m1, hazardRatio = hr,
                          dropoutRate1 = dropoutRate1, dropoutRate2 = dropoutRate2, 
                          dropoutTime = dropoutTime,
                          accrualTime = accrualTime, accrualIntensity = accrualIntensity,
                          maxNumberOfEvents = nevents[3], directionUpper = FALSE)
summary(power)

Power calculation for a survival endpoint

Sequential analysis with a maximum of 3 looks (group sequential design), one-sided overall significance level 2.5%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, power directed towards smaller values, H1: hazard ratio = 0.75, control lambda(2) = 0.01, maximum number of events = 385, accrual time = 12, accrual intensity = 100, dropout rate(1) = 0.025, dropout rate(2) = 0.025, dropout time = 12.

Stage	1	2	3
Planned information rate	33.3%	66.7%	100%
Cumulative alpha spent	0	0	0.0250
Stage levels (one-sided)	0	0	0.0250
Efficacy boundary (z-value scale)	Inf	Inf	1.960
Futility boundary (z-value scale)	0	0.845
Efficacy boundary (t)	0	0	0.819
Futility boundary (t)	1.000	0.900
Cumulative power	0	0	0.7800
Number of subjects	1200.0	1200.0	1200.0
Expected number of subjects under H1			1200.0
Cumulative number of events	128.3	256.7	385.0
Expected number of events under H1			365.7
Analysis time	19.71	35.69	54.96
Expected study duration under H1			52.24
Overall exit probability (under H0)	0.5000	0.3207
Overall exit probability (under H1)	0.0516	0.0469
Exit probability for efficacy (under H0)	0	0
Exit probability for efficacy (under H1)	0	0
Exit probability for futility (under H0)	0.5000	0.3207
Exit probability for futility (under H1)	0.0516	0.0469

Legend:

(t): treatment effect scale

Finally, we compute the actual dates of the analyses, relying on an assumed first patient in date:

suppressMessages(library(lubridate))
fpi <- ymd("2020-04-23")
d <- round(30 * (time - floor(time)))
dates <- c(fpi, fpi %m+% months(floor(time)) + days(d))
dates

[1] "2020-04-23" "2021-12-14" "2023-04-10" "2024-11-16" "2026-09-04"

6 Reporting of the trial

To illustrate trial simulation we first need to make a few own functions available:

# --------------------------------------------------------------
# functions
# --------------------------------------------------------------

# Quantile function for a Weibull with a potential cure proportion.
qWeibullCure <- function(p, p0, shape = 1, scale){
  res <- rep(NA, length(p))
  ind1 <- (p <= p0)
  ind2 <- (p > p0)
  res[ind1] <- Inf
  res[ind2] <- qweibull(1 - (p[ind2] - p0) / (1 - p0), shape = shape, scale = scale)
  return(res)  
}

# simulate Weibull distributed time-to-event data in one group, 
# where administrative censoring is applied after a pre-specified
# number of events. A cure proportion and censoring due to drop-out 
# can be specified in addition.
rWeibull1arm <- function(shape = 1, scale, cure = 0, recruit, dropout = 0, 
                         start.accrual = 0, cutoff, seed = NA){
  
  # shape             Weibull shape parameter. 
  # scale:            Weibull scale parameter.
  # cure:             Proportion of patients assumed to be cured, i.e. with an 
  #                   event at +infty.
  # recruit:          Recruitment.
  # dropout:          Drop-out rate, on same time scale as med.
  # start.accrual:    Time unit where accrual should start. Might be useful when 
  #                   simulating multi-stage trials.
  # cutoff:           Cutoff, #events the final censored data should have 
  #                   (can be a vector of multiple cutoffs).
  # seed:             If different from NA, seed used to generate random numbers.
  #
  # Kaspar Rufibach, June 2014
  
  if (is.na(seed) == FALSE){set.seed(seed)}
  
  n <- sum(recruit)
  
  # generate arrival times
  arrive <- rep(1:length(recruit), times = recruit)
  arrivetime <- NULL
  for (i in 1:n){arrivetime[i] <- runif(1, min = arrive[i] - 1, max = arrive[i])}
  arrivetime <- start.accrual + sort(arrivetime)
  
  # generate event times: Exp(lambda) = Weibull(shape = 1, scale = 1 / lambda)
  eventtime <- qWeibullCure(runif(n), p0 = cure, shape = shape, scale = scale)
  
  # Apply drop-out. Do this before applying the cutoff below, 
  # in order to correctly count necessary #events.
  dropouttime <- rep(Inf, n)
  if (dropout > 0){dropouttime <- rexp(n, rate = dropout)}
  event.dropout <- ifelse(eventtime > dropouttime, 0, 1)
  time.dropout <- ifelse(event.dropout == 1, eventtime, dropouttime)   
  
  # observed times, taking into account staggered entry
  tottime <- arrivetime + eventtime
  
  # find cutoff based on number of targeted events
  # only look among patients that are not considered dropped-out
  time <- data.frame(matrix(NA, ncol = length(cutoff), nrow = n))
  event <- time
  cutoff.time <- rep(NA, length(cutoff))
  
  for (j in 1:length(cutoff)){
    cutoff.time[j] <- sort(tottime[event.dropout == 1])[cutoff[j]]
    
    # apply administrative censoring at cutoff
    event[event.dropout == 1, j] <- ifelse(
      tottime[event.dropout == 1] > cutoff.time[j], 0, 1)
    event[event.dropout == 0, j] <- 0
    
    # define time to event, taking into account both types of censoring
    time[event.dropout == 1, j] <- ifelse(event[, j] == 1, 
                                          eventtime, cutoff.time[j] - 
                                            arrivetime)[event.dropout == 1]    
    time[event.dropout == 0, j] <- pmin(cutoff.time[j] - arrivetime, 
                                        time.dropout)[event.dropout == 0]
    
    # remove times for patients arriving after the cutoff
    rem <- (arrivetime > cutoff.time[j])
    if (TRUE %in% rem){time[rem, j] <- NA}
  }
  
  # generate output
  tab <- data.frame(cbind(1:n, arrivetime, eventtime, tottime, dropouttime, time, event))
  colnames(tab) <- c("pat", "arrivetime", "eventtime", "tottime", "dropouttime", 
                     paste("time cutoff = ", cutoff, sep = ""), 
                     paste("event cutoff = ", cutoff, sep = ""))
  
  res <- list("cutoff.time" = cutoff.time, "tab" = tab)
  return(res)
}

# simulate Weibull distributed time-to-event data in two arms, 
# where administrative censoring is applied after a pre-specified 
# number of events. A cure proportion and censoring due to drop-out 
# can be specified in addition.
rWeibull2arm <- function(shape = c(1, 1), scale, cure = c(0, 0), recruit, 
                         dropout = c(0, 0), start.accrual = c(0, 0), cutoff, seed = NA){
  
  # shape             2-d vector of Weibull shape parameter. 
  # scale             2-d vector of Weibull scale parameter.
  # cure:             2-d vector with cure proportion assumed in each arm.
  # recruit:          List with two elements, vector of 
  #                   recruitment in each arm.
  # dropout:          2-d vector with drop-out rate for each arm, 
  #                   on same time scale as med.
  # start.accrual:    2-d vector of time when accrual should start. 
  #                   Might be useful when simulating multi-stage trials.
  # cutoff:           Cutoff, #events the final censored data should have 
  #                   (can be a vector of multiple cutoffs).
  # seed:             If different from NA, seed used to generate random numbers.
  #
  # Kaspar Rufibach, June 2014
  
  if (is.na(seed) == FALSE){set.seed(seed)}
  
  dat1 <- rWeibull1arm(scale = scale[1], shape = shape[1], recruit = recruit[[1]], 
                       cutoff = 1, dropout = dropout[1], cure = cure[1], 
                       start.accrual = start.accrual[1], seed = NA)$tab
  dat2 <- rWeibull1arm(scale = scale[2], shape = shape[2], recruit = recruit[[2]], 
                       cutoff = 1, dropout = dropout[2], cure = cure[2], 
                       start.accrual = start.accrual[2], seed = NA)$tab
  
  n <- c(nrow(dat1), nrow(dat2))
  
  # treatment variable
  tmt <- factor(c(rep(0, n[1]), rep(1, n[2])), levels = 0:1, labels = c("A", "B"))
  
  arrivetime <- c(dat1[, "arrivetime"], dat2[, "arrivetime"])
  eventtime <- c(dat1[, "eventtime"], dat2[, "eventtime"])
  tottime <- c(dat1[, "tottime"], dat2[, "tottime"])
  dropouttime <- c(dat1[, "dropouttime"], dat2[, "dropouttime"])
  
  # Apply drop-out. Do this before applying the cutoff below, 
  #in order to correctly count necessary #events.
  event.dropout <- ifelse(eventtime > dropouttime, 0, 1)
  time.dropout <- ifelse(event.dropout == 1, eventtime, dropouttime)   
  
  # find cutoff based on number of targeted events
  # only look among patients that are not considered dropped-out
  time <- data.frame(matrix(NA, ncol = length(cutoff), nrow = sum(n)))
  event <- time
  cutoff.time <- rep(NA, length(cutoff))
  
  for (j in 1:length(cutoff)){
    cutoff.time[j] <- sort(tottime[event.dropout == 1])[cutoff[j]]
    
    # apply administrative censoring at cutoff
    event[event.dropout == 1, j] <- ifelse(tottime[event.dropout == 1] > 
                                             cutoff.time[j], 0, 1)
    event[event.dropout == 0, j] <- 0
    
    # define time to event, taking into account both types of censoring
    time[event.dropout == 1, j] <- ifelse(event[, j] == 1, eventtime, 
                                    cutoff.time[j] - arrivetime)[event.dropout == 1]    
    time[event.dropout == 0, j] <- pmin(cutoff.time[j] - arrivetime, 
                                    time.dropout)[event.dropout == 0]
    
    # remove times for patients arriving after the cutoff
    rem <- (arrivetime > cutoff.time[j])
    if (TRUE %in% rem){time[rem, j] <- NA}
  }
  
  # generate output
  tab <- data.frame(cbind(1:sum(n), tmt, arrivetime, eventtime, tottime, 
                          dropouttime, time, event))
  colnames(tab) <- c("pat", "tmt", "arrivetime", "eventtime", "tottime", 
                     "dropouttime", paste("time cutoff = ", cutoff, sep = ""), 
                     paste("event cutoff = ", cutoff, sep = ""))

  res <- list("cutoff.time" = cutoff.time, "tab" = tab)
  return(res)
}

Now, generate trial data to reproduce numbers in Section 6 of paper, using appropriate seeds:

library(survival)

# accrual
recruit1 <- rep(accrualIntensity / 2, max(accrualTime))
recruit2 <- recruit1
recruit <- list(recruit1, recruit2)
n <- sum(unlist(recruit))
start.accrual <- c(0, 0)

# drop-out (2.5% annually)
dropout.year <- 0.025
dropout.month <- -log(1 - dropout.year) / 12
dropout <- rep(dropout.month, 2)

# cure proportion 
cure <- c(0, 0)

# --------------------------------------------------
# simulate a trial stopped for futility
# --------------------------------------------------
med <- c(m1, m1)
lambda <- log(2) / med
scale <- 1 / lambda
j <- 1
cut1 <- nevents[j] + 3

trial1 <- rWeibull2arm(scale = scale, shape = c(1, 1), 
                        cure = cure, recruit = recruit, dropout = dropout, 
                        start.accrual = start.accrual, cutoff = cut1, 
                        seed = 8)
tmt <- trial1$tab["tmt"][, 1]
time <- trial1$tab[paste("time cutoff = ", cut1, sep = "")][, 1]
event <- trial1$tab[paste("event cutoff = ", cut1, sep = "")][, 1]

cox1 <- summary(coxph(Surv(time, event) ~ tmt))
HR1 <- exp(coef(cox1)[1])
p1 <- coef(cox1)[1, "Pr(>|z|)"]

HR1

[1] 1.071779

p1

[1] 0.6906199

cut1

[1] 132

# date of CCOD
trial1$cutoff.time

[1] 18.2883

d <- round(30 * (trial1$cutoff.time - floor(trial1$cutoff.time)))
c(fpi, fpi %m+% months(floor(trial1$cutoff.time)) + days(d))

[1] "2020-04-23" "2021-11-01"

# --------------------------------------------------
# simulate a trial stopped for efficacy
# --------------------------------------------------
med <- m1 / c(1, hr)
lambda <- log(2) / med
scale <- 1 / lambda
j <- 2
cut2 <- nevents[j] - 2

trial2 <- rWeibull2arm(scale = scale, shape = c(1, 1), 
                        cure = cure, recruit = recruit, dropout = dropout, 
                        start.accrual = start.accrual, cutoff = cut2, 
                        seed = 7)
tmt <- trial2$tab["tmt"][, 1]
time <- trial2$tab[paste("time cutoff = ", cut2, sep = "")][, 1]
event <- trial2$tab[paste("event cutoff = ", cut2, sep = "")][, 1]

cox2 <- summary(coxph(Surv(time, event) ~ tmt))
HR2 <- exp(coef(cox2)[1])
ci2 <- exp(confint(coxph(Surv(time, event) ~ tmt)))
p2 <- coef(cox2)[1, "Pr(>|z|)"]

hrMDD[j]

[1] 0.7306326

HR2

[1] 0.6868411

ci2

         2.5 %    97.5 %
tmtB 0.5354192 0.8810866

p2

[1] 0.003113957

cut2

[1] 255

trial2$cutoff.time

[1] 34.67596

# recalculate local significance level based on observed number of events
info2 <- c(info[1], cut2 / nevents[3], 1)
designUpdate2 <- getDesignGroupSequential(sided = 1, alpha = alpha / 2, 
                                          beta = beta,
                                          informationRates = info2, 
                                          typeOfDesign = "asOF")

samplesize2 <- getSampleSizeSurvival(design = designUpdate2,
                                     lambda2 = log(2) / m1, hazardRatio = hr,
                                     dropoutRate1 = dropoutRate1, 
                                     dropoutRate2 = dropoutRate2,
                                     dropoutTime = dropoutTime,
                                     accrualTime = accrualTime, 
                                     accrualIntensity = accrualIntensity)

alphas2 <- as.vector(samplesize2$criticalValuesPValueScale)
hrMDD2 <- as.vector(samplesize2$criticalValuesEffectScale)

# original and updated local significance levels
rbind(alphas, alphas2 * 2)

               [,1]       [,2]       [,3]
alphas 0.0002070114 0.01202440 0.04625625
       0.0002070114 0.01169829 0.04635103

rbind(hrMDD, hrMDD2)

            [,1]      [,2]      [,3]
hrMDD  0.5190684 0.7306326 0.8159843
hrMDD2 0.5190204 0.7289663 0.8160329

# median unbiased estimate, assuming the futility interim had happened 
# at the prespecified number of events with an observed HR of 0.9
results2 <- getDataset(
  overallEvents = c(nevents[1], cut2),
  overallLogRanks = c(log(0.9)/sqrt(4 / nevents[1]), coef(cox2)[1, "z"]),
  overallAllocationRatio = c(1, 1))
adj_result2 <- getAnalysisResults(design = designUpdate2,
                                  dataInput = results2,
                                  stage = 2, directionUpper = FALSE)
adj_result2$medianUnbiasedEstimates[2]

[1] 0.6908581

c(adj_result2$finalConfidenceIntervalLowerBounds[2], 
  adj_result2$finalConfidenceIntervalUpperBounds[2])

[1] 0.5403963 0.8834608

# date of CCOD
trial2$cutoff.time

[1] 34.67596

d <- round(30 * (trial2$cutoff.time - floor(trial2$cutoff.time)))
c(fpi, fpi %m+% months(floor(trial2$cutoff.time)) + days(d))

[1] "2020-04-23" "2023-03-15"

7 References

Asikanius, Elina, Benjamin Hofner, Lisa V. Hampson, Gernot Wassmer, Christopher Jennison, Tobias Mielke, Cornelia Ursula Kunz, and Kaspar Rufibach. 2025+. “Clinical Trials with Interim Analyses: Standardizing Terminology to Increase Clarity.” https://arxiv.org/abs/2410.01478.

Wassmer, Gernot, and Friedrich Pahlke. 2024. Rpact: Confirmatory Adaptive Clinical Trial Design and Analysis. https://CRAN.R-project.org/package=rpact.