Compute observation periods with a specified event density — get_observation

For each patient, get_observation_periods() computes time periods of when patients meet a specified density of given healthcare events.

Usage

get_observation_periods(
  table,
  dates,
  window_size,
  min_utilization,
  min_event_dates_per_period = 2,
  min_period_length = 30
)

Arguments

table: A table of events which includes a person_id column and at least one date column. Often this will be a table from a PicnicHealth data set such as visit, cohort_measurement_occurrence, measurement_occurrence, cohort_procedure_occurrence, procedure_occurrence, cohort_drug_era, document etc; these can be filtered for specific entities before passing to this function. Only one row per patient per event start date will be kept (see Details).
dates: A character vector of length 1 or 2 containing the name of the date variable(s) for the events in table. If passed two variable names, the first one must give the start of each event, and the second one must give the end of each event. It's ok if the end dates are sometimes NA. If only one date column is specified, events are assumed to begin and end on the same day.
window_size: Integer number of days in rolling window.
min_utilization: For a given date to appear in an observation period, a patient must have this many events within window_size/2 days of the date. In other words, an observation period is the period of time over which a patient has at least min_utilization events every window_size days.
min_event_dates_per_period: Minimum number of distinct days with events that must have occurred during a candidate period. Default is 2; changing this value is not generally necessary.
min_period_length: Minimum period length in days, inclusive of period start and end dates.

Value

A tibble with the following columns:

person_id: same as the person_id from table argument
period_start: first day of the observation period
period_end: last day of the observation period
period_length: length of the period in days, inclusive of end dates
n_event_dates: number of distinct days in the period on which the patient had events

Details

This function takes a table of events, a specification of which of its columns are dates, and a window_size as input.

First, duplicate events on the same day are removed: the function keeps only one event per start date per patient (if end dates are provided, the latest end date is kept). From this point forward, a "count of events" for a patient is really the count of distinct event start dates for the patient.

For each patient, the function then computes the time periods when their "healthcare utilization" is greater than min_utilization. A patient's healthcare utilization on date x is defined as the number of distinct days on which they had an event start date within a window of window_size days centered on x (i.e. window_size/2 days before x and window_size/2 days after x, inclusive).

Next, periods with fewer than min_event_dates_per_period events are removed. Furthermore, periods are clipped to the first and last events in them: the extra days before the first event and after the last event are stripped, such that the first (respectively, last) date of the period is the date of the first (resp., last) event within it.

Finally, periods shorter than min_period_length are removed.

For more information and examples, refer to vignette("observation_periods").

Examples

if (FALSE) { # \dontrun{
# Periods when patients had at least one visit every 120 days
get_observation_periods(
  table = ds$visit,
  dates = c("visit_start_date", "visit_end_date"),
  window_size = 120,
  min_utilization = 1
)
# Periods when patients had at least three hemoglobin measurements every 365 days
ds$lab_result %>%
  # use the concept_id corresponding to hemoglobin
  filter(measurement_concept_id == "29124cfe-a4a8-5939-8d36-3f43f0017600") %>%
  get_observation_periods(
    dates = "collection_date",
    window_size = 365,
    min_utilization = 3
  )
} # }