Create a Table 1 • PicnicHealth

Introduction

As a first step in most analyses, particularly in the exploratory phase, the analyst may wish to create a standard “Table 1” for a quick look at some basic demographics for the cohort, such sex (at birth), race, ethnic, age at enrollment, etc. This information with patient-level data (PLD), i.e., with one-row-per-patient, may be found in the person table and the analyst could group by the variable of interest to obtain the relevant counts.

Additionally, the analyst may wish obtain some high-level statistics on health care utilization (HCU) in the cohort such as the median Total Years of Clinical Documents, Total Years of Visits, Number of Providers, Number of Care Sites, Number of Hospitalizations, and the Total Hospital Days. For example, one simplified approach to obtaining high-level statistics for the median Total Years of Visits, would be to find the earliest visit_start_date and the most recent visit_end_date in the visit table for each patient grouping by person_id, compute the span in years for each patient, and finally compute the median span among all patients. Similar approaches can be followed to find the Total Years of Clinical Documents using document_date in the document table, etc.

See below for some hands-on implementation examples using the PicnicHealth R package which has a number of built-in functions to help facilitate this. Note that we will use a small, synthetic data set that may not necessarily be clinically plausible in all cases and is only used to demonstrate functionality.

PicnicHealth R Package

Once a PicnicHealth data set is loaded into memory, we can use the table_one() function in the PicnicHealth R package. See help(table_one) for more information. Note here that we use a very small sample dataset for demonstration purposes only.

To demonstrate this, we will create a standard “Table 1” for:

The whole data set
Stratified by sex

Loading Data

We can load a PicnicHealth data set into memory using the load_data_set() function. A PicnicHealth data set is an unzipped directory of CSV files and we need to specify the full path to that data set. See help(load_data_set) for more information.

library(PicnicHealth)
library(tidyverse)
ds = load_data_set("path/to/data")

Example 1: Entire Cohort

Here we can proceed directly to using the table_one() function and we pass in the entire data set ds as the input argument.

table_one(ds)

Characteristic	N = 50¹
Sex
Female	39 (78%)
Male	11 (22%)
Race
Black or African American	1 (2.0%)
More than one race	3 (6.0%)
Unknown	1 (2.0%)
White	45 (90%)
Ethnicity
Hispanic or Latino	2 (4.0%)
Not Hispanic or Latino	47 (94%)
Prefer not to say	1 (2.0%)
Age	44 (35, 56)
Total Years of Clinical Documents	16.6 (12.1, 19.2)
Total Years of Visits	15 (10, 19)
Number of Providers	0 (0, 0)
Number of Care Sites	0 (0, 0)
Number of Hospitalizations	1 (0, 3)
Total Hospital Days	1.00 (0.00, 3.00)
¹ n (%); Median (Q1, Q3)

Example 2: Stratified by Biological Sex

To produce a standard “Table 1” with columns that stratify by biological sex, we only need to specify the stratify_by_sex = TRUE argument when calling the table_one() function.

table_one(ds, stratify_by_sex = TRUE)

Characteristic	Female N = 39¹	Male N = 11¹
Race
Black or African American	1 (2.6%)	0 (0%)
More than one race	2 (5.1%)	1 (9.1%)
Unknown	1 (2.6%)	0 (0%)
White	35 (90%)	10 (91%)
Ethnicity
Hispanic or Latino	1 (2.6%)	1 (9.1%)
Not Hispanic or Latino	37 (95%)	10 (91%)
Prefer not to say	1 (2.6%)	0 (0%)
Age	45 (35, 58)	43 (40, 52)
Total Years of Clinical Documents	17.0 (13.2, 19.3)	12.2 (8.5, 18.5)
Total Years of Visits	14 (10, 18)	17 (10, 19)
Number of Providers	0 (0, 0)	0 (0, 0)
Number of Care Sites	0 (0, 0)	0 (0, 0)
Number of Hospitalizations	2 (0, 3)	1 (0, 3)
Total Hospital Days	2.00 (0.00, 3.00)	1.00 (0.00, 3.00)
¹ n (%); Median (Q1, Q3)

Example 3: Adding Census Region

There are optional arguments to add geographic information such as census region or census division to the output table geography = "region" or geography = "division" when calling the table_one() function. Here we demonstrate with geography = "region":

table_one(ds, geography = "region")

Characteristic	N = 50¹
Sex
Female	39 (78%)
Male	11 (22%)
Race
Black or African American	1 (2.0%)
More than one race	3 (6.0%)
Unknown	1 (2.0%)
White	45 (90%)
Ethnicity
Hispanic or Latino	2 (4.0%)
Not Hispanic or Latino	47 (94%)
Prefer not to say	1 (2.0%)
Census Region
Midwest	3 (9.7%)
Northeast	4 (13%)
South	15 (48%)
West	9 (29%)
Unknown	19
Age	44 (35, 56)
Total Years of Clinical Documents	16.6 (12.1, 19.2)
Total Years of Visits	15 (10, 19)
Number of Providers	0 (0, 0)
Number of Care Sites	0 (0, 0)
Number of Hospitalizations	1 (0, 3)
Total Hospital Days	1.00 (0.00, 3.00)
¹ n (%); Median (Q1, Q3)