{flowchart}: A Tidy R Package for Data Flowchart Generation
Pau Satorra
Biostatistics Support and Research Unit, IGTP
João Carmezim
Biostatistics Support and Research Unit, IGTP
Natàlia Pallarès
Biostatistics Support and Research Unit, IGTP
Kenneth A. Taylor
Komodo Health, University of South Florida
Cristian Tebé
Biostatistics Support and Research Unit, IGTP
April 24, 2025
Index
1. Introduction
2. The tidyverse
3. The {flowchart} package
4. Hands-on examples
5. Conclusions
Introduction
Flowcharts
In any study, a participant flowchart serves as a visual representation of the steps and decisions in the study workflow.
Usually different decisions are made from the initial cohort of eligible or screened subjects until a final number of these subjects are considered to be included in the analyses.
It is essential that the steps and numbers are clearly defined and that the process is transparent in order to ensure the reproducibility of the study and the quality of the reporting.
Flowcharts
Participant flowcharts evolved from the broader concept of flowcharts introduced by industrial engineers.
Frank and Lillian Gilbreth introduced the idea of flowcharts in 1921 as “Flow Process Charts” to the American Society of Mechanical Engineers:
Flowcharts
Flowcharts in clinical research
In clinical research, the CONSORT, STROBE and ICH guidelines strongly recommend the use of flowcharts.
The CONSORT guideline provides a template for the elaboration of a flowchart in the context of a randomized trial of two groups:
Flowcharts in clinical research
Flowchart creation
The creation of these flowcharts is a joint task between the data management team and the statisticians.
It is time-consuming and labor-intensive, as every screened or recruited subject must be included, without exception.
Usually this process must be repeated many times until the database is closed for analysis.
flowchart packages
There are several R packages dedicated to building flowcharts: {Gmisc}, {DiagrammeR}, {consort}, {ggflowchart}.
Complex programming and manual parameterization are often involved.
Some are designed for building other kind of diagrams.
A set of R packages ideal for data management. They will make your life a lot easier.
The tidyverse
The philosophy of tidyverse is to concatenate basic functions applied to a tibble (dataframe) to accomplish complex manipulations integrated into a tidy workflow.
The tidyverse workflow is based on the usage of the pipe operator, which can be the native pipe (|>) or the magrittr pipe (%>%).
Pipe operator
The pipe operator allows to concatenate multiple functions applied to the same object:
#Round π to 6 decimalsround(pi, 6)
[1] 3.141593
Pipe operator
The pipe operator allows to concatenate multiple functions applied to the same object:
#Equivalent using pipespi |>round(6)
[1] 3.141593
Pipe operator
The pipe operator allows to concatenate multiple functions applied to the same object:
#Exponential of the square root of π and then round to 6 decimalsround(exp(sqrt(pi)), 6)
[1] 5.885277
Pipe operator
The pipe operator allows to concatenate multiple functions applied to the same object:
#Equivalent using pipespi |>sqrt() |>exp() |>round(6)
[1] 5.885277
The tidyverse
This is an example of what a tidyverse workflow looks like compared to base R:
710 excluded:
136 did not meet inclusion criteria
134 declined to participate
440 met exclusion criteria:
- 133 chronic heart failure
- 70 clinical status with expected death in <24h
- 68 polymicrobial bacteremia
- 56 conditions expected to affect adhrence to the protocol
- 53 suspicion of prosthetic valve endocarditis
- 33 severe liver cirrhosis
- 27 acute SARS-CoV-2 infection
- 28 beta-lactam or fosfomycin hypersensitivity
- 10 participation in another clinical trial
- 5 pregnancy or breastfeeding
- 4 previous participation in the SAFO trial
- 3 myasthenia gravis
Example 2
First, we have to build the text in the exclude boxes:
#For the PP exclude box (cloxacillin plus fosfomycin):safo1 <- safo |>filter(group =="cloxacillin plus fosfomycin", !is.na(reason_pp)) |>mutate(reason_pp =droplevels(reason_pp))label_exc1 <-paste(c(str_glue("{nrow(safo1)} excluded:"),map_chr(levels(safo1$reason_pp), ~str_glue(" - {sum(safo1$reason_pp == .x)} {.x}"))),collapse ="\n")label_exc1 <-str_replace_all(label_exc1, c("nosocomial"="nosocomial\n", "treatment"="treatment\n"))cat(label_exc1)
Example 2
3 excluded:
- 2 fosfomycin-resistant strain in index blood cultures
- 1 protocol violation
Example 2
First, we have to build the text in the exclude boxes: