Skip to contents

Introduction

As a first step in most analyses, particularly in the exploratory phase, the analyst may wish to create a standard “Table 1” for a quick look at some basic demographics for the cohort, such sex (at birth), race, ethnic, age at enrollment, etc. This information with patient-level data (PLD), i.e., with one-row-per-patient, may be found in the person table and the analyst could group by the variable of interest to obtain the relevant counts.

Additionally, the analyst may wish obtain some high-level statistics on health care utilization (HCU) in the cohort such as the median Total Years of Clinical Documents, Total Years of Visits, Number of Providers, Number of Care Sites, Number of Hospitalizations, and the Total Hospital Days. For example, one simplified approach to obtaining high-level statistics for the median Total Years of Visits, would be to find the earliest visit_start_date and the most recent visit_end_date in the visit table for each patient grouping by person_id, compute the span in years for each patient, and finally compute the median span among all patients. Similar approaches can be followed to find the Total Years of Clinical Documents using document_date in the document table, etc.

See below for some hands-on implementation examples using the PicnicHealth R package which has a number of built-in functions to help facilitate this. Note that we will use a small, synthetic data set that may not necessarily be clinically plausible in all cases and is only used to demonstrate functionality.

PicnicHealth R Package

Once a PicnicHealth data set is loaded into memory, we can use the table_one() function in the PicnicHealth R package. See help(table_one) for more information. Note here that we use a very small sample dataset for demonstration purposes only.

To demonstrate this, we will create a standard “Table 1” for:

  • The whole data set

  • Stratified by sex

Loading Data

We can load a PicnicHealth data set into memory using the load_data_set() function. A PicnicHealth data set is an unzipped directory of CSV files and we need to specify the full path to that data set. See help(load_data_set) for more information.

library(PicnicHealth)
library(tidyverse)
ds = load_data_set("path/to/data")

Example 1: Entire Cohort

Here we can proceed directly to using the table_one() function and we pass in the entire data set ds as the input argument.

Characteristic N = 501
Sex
    Female 39 (78%)
    Male 11 (22%)
Race
    Black or African American 1 (2.0%)
    More than one race 3 (6.0%)
    Unknown 1 (2.0%)
    White 45 (90%)
Ethnicity
    Hispanic or Latino 2 (4.0%)
    Not Hispanic or Latino 47 (94%)
    Prefer not to say 1 (2.0%)
Age 44 (35, 56)
Total Years of Clinical Documents 16.6 (12.1, 19.2)
Total Years of Visits 15 (10, 19)
Number of Providers 0 (0, 0)
Number of Care Sites 0 (0, 0)
Number of Hospitalizations 1 (0, 3)
Total Hospital Days 1.00 (0.00, 3.00)
1 n (%); Median (Q1, Q3)

Example 2: Stratified by Biological Sex

To produce a standard “Table 1” with columns that stratify by biological sex, we only need to specify the stratify_by_sex = TRUE argument when calling the table_one() function.

table_one(ds, stratify_by_sex = TRUE)
Characteristic Female
N = 39
1
Male
N = 11
1
Race

    Black or African American 1 (2.6%) 0 (0%)
    More than one race 2 (5.1%) 1 (9.1%)
    Unknown 1 (2.6%) 0 (0%)
    White 35 (90%) 10 (91%)
Ethnicity

    Hispanic or Latino 1 (2.6%) 1 (9.1%)
    Not Hispanic or Latino 37 (95%) 10 (91%)
    Prefer not to say 1 (2.6%) 0 (0%)
Age 45 (35, 58) 43 (40, 52)
Total Years of Clinical Documents 17.0 (13.2, 19.3) 12.2 (8.5, 18.5)
Total Years of Visits 14 (10, 18) 17 (10, 19)
Number of Providers 0 (0, 0) 0 (0, 0)
Number of Care Sites 0 (0, 0) 0 (0, 0)
Number of Hospitalizations 2 (0, 3) 1 (0, 3)
Total Hospital Days 2.00 (0.00, 3.00) 1.00 (0.00, 3.00)
1 n (%); Median (Q1, Q3)

Example 3: Adding Census Region

There are optional arguments to add geographic information such as census region or census division to the output table geography = "region" or geography = "division" when calling the table_one() function. Here we demonstrate with geography = "region":

table_one(ds, geography = "region")
Characteristic N = 501
Sex
    Female 39 (78%)
    Male 11 (22%)
Race
    Black or African American 1 (2.0%)
    More than one race 3 (6.0%)
    Unknown 1 (2.0%)
    White 45 (90%)
Ethnicity
    Hispanic or Latino 2 (4.0%)
    Not Hispanic or Latino 47 (94%)
    Prefer not to say 1 (2.0%)
Census Region
    Midwest 3 (9.7%)
    Northeast 4 (13%)
    South 15 (48%)
    West 9 (29%)
    Unknown 19
Age 44 (35, 56)
Total Years of Clinical Documents 16.6 (12.1, 19.2)
Total Years of Visits 15 (10, 19)
Number of Providers 0 (0, 0)
Number of Care Sites 0 (0, 0)
Number of Hospitalizations 1 (0, 3)
Total Hospital Days 1.00 (0.00, 3.00)
1 n (%); Median (Q1, Q3)