Create a Table 1
table_one.Rmd
Introduction
As a first step in most analyses, particularly in the exploratory
phase, the analyst may wish to create a standard “Table 1” for a quick
look at some basic demographics for the cohort, such sex (at birth),
race, ethnic, age at enrollment, etc. This information with
patient-level data (PLD), i.e., with one-row-per-patient, may be found
in the person
table and the analyst could group by the
variable of interest to obtain the relevant counts.
Additionally, the analyst may wish obtain some high-level statistics
on health care utilization (HCU) in the cohort such as the median Total
Years of Clinical Documents, Total Years of Visits, Number of Providers,
Number of Care Sites, Number of Hospitalizations, and the Total Hospital
Days. For example, one simplified approach to obtaining high-level
statistics for the median Total Years of Visits, would be to find the
earliest visit_start_date
and the most recent
visit_end_date
in the visit
table for each
patient grouping by person_id
, compute the span in years
for each patient, and finally compute the median span among all
patients. Similar approaches can be followed to find the Total Years of
Clinical Documents using document_date
in the
document
table, etc.
See below for some hands-on implementation examples using the PicnicHealth R package which has a number of built-in functions to help facilitate this. Note that we will use a small, synthetic data set that may not necessarily be clinically plausible in all cases and is only used to demonstrate functionality.
PicnicHealth R Package
Once a PicnicHealth data set is loaded into memory, we can use the
table_one()
function in the PicnicHealth R package. See
help(table_one)
for more information. Note here that we use
a very small sample dataset for demonstration purposes only.
To demonstrate this, we will create a standard “Table 1” for:
The whole data set
Stratified by sex
Loading Data
We can load a PicnicHealth data set into memory using the
load_data_set()
function. A PicnicHealth data set is an
unzipped directory of CSV files and we need to specify the full path to
that data set. See help(load_data_set)
for more
information.
library(PicnicHealth)
library(tidyverse)
ds = load_data_set("path/to/data")
Example 1: Entire Cohort
Here we can proceed directly to using the table_one()
function and we pass in the entire data set ds
as the input
argument.
table_one(ds)
Characteristic | N = 501 |
---|---|
Sex | |
Female | 39 (78%) |
Male | 11 (22%) |
Race | |
Black or African American | 1 (2.0%) |
More than one race | 3 (6.0%) |
Unknown | 1 (2.0%) |
White | 45 (90%) |
Ethnicity | |
Hispanic or Latino | 2 (4.0%) |
Not Hispanic or Latino | 47 (94%) |
Prefer not to say | 1 (2.0%) |
Age | 44 (35, 56) |
Total Years of Clinical Documents | 16.6 (12.1, 19.2) |
Total Years of Visits | 15 (10, 19) |
Number of Providers | 0 (0, 0) |
Number of Care Sites | 0 (0, 0) |
Number of Hospitalizations | 1 (0, 3) |
Total Hospital Days | 1.00 (0.00, 3.00) |
1 n (%); Median (Q1, Q3) |
Example 2: Stratified by Biological Sex
To produce a standard “Table 1” with columns that stratify by
biological sex, we only need to specify the
stratify_by_sex = TRUE
argument when calling the
table_one()
function.
table_one(ds, stratify_by_sex = TRUE)
Characteristic |
Female N = 391 |
Male N = 111 |
---|---|---|
Race | ||
Black or African American | 1 (2.6%) | 0 (0%) |
More than one race | 2 (5.1%) | 1 (9.1%) |
Unknown | 1 (2.6%) | 0 (0%) |
White | 35 (90%) | 10 (91%) |
Ethnicity | ||
Hispanic or Latino | 1 (2.6%) | 1 (9.1%) |
Not Hispanic or Latino | 37 (95%) | 10 (91%) |
Prefer not to say | 1 (2.6%) | 0 (0%) |
Age | 45 (35, 58) | 43 (40, 52) |
Total Years of Clinical Documents | 17.0 (13.2, 19.3) | 12.2 (8.5, 18.5) |
Total Years of Visits | 14 (10, 18) | 17 (10, 19) |
Number of Providers | 0 (0, 0) | 0 (0, 0) |
Number of Care Sites | 0 (0, 0) | 0 (0, 0) |
Number of Hospitalizations | 2 (0, 3) | 1 (0, 3) |
Total Hospital Days | 2.00 (0.00, 3.00) | 1.00 (0.00, 3.00) |
1 n (%); Median (Q1, Q3) |
Example 3: Adding Census Region
There are optional arguments to add geographic information such as
census region or census division to the output table
geography = "region"
or geography = "division"
when calling the table_one()
function. Here we demonstrate
with geography = "region"
:
table_one(ds, geography = "region")
Characteristic | N = 501 |
---|---|
Sex | |
Female | 39 (78%) |
Male | 11 (22%) |
Race | |
Black or African American | 1 (2.0%) |
More than one race | 3 (6.0%) |
Unknown | 1 (2.0%) |
White | 45 (90%) |
Ethnicity | |
Hispanic or Latino | 2 (4.0%) |
Not Hispanic or Latino | 47 (94%) |
Prefer not to say | 1 (2.0%) |
Census Region | |
Midwest | 3 (9.7%) |
Northeast | 4 (13%) |
South | 15 (48%) |
West | 9 (29%) |
Unknown | 19 |
Age | 44 (35, 56) |
Total Years of Clinical Documents | 16.6 (12.1, 19.2) |
Total Years of Visits | 15 (10, 19) |
Number of Providers | 0 (0, 0) |
Number of Care Sites | 0 (0, 0) |
Number of Hospitalizations | 1 (0, 3) |
Total Hospital Days | 1.00 (0.00, 3.00) |
1 n (%); Median (Q1, Q3) |