Skip to contents

When analyzing real world data derived from medical records collected during routine patient care, there may be limitations the precision or certainty of information about the timing of an event. Below, we discuss some considerations for the analysis of temporal data.

Datetime Precision

In medical records, dates may exist at varying levels of precision. For example, a provider note in a medical record may indicate only the year or month in which a diagnosis occurred - “Patient diagnosed with UC in 2008” - whereas for some data types (labs, vital signs) the full date is always available. In order to consistently record this data, PicnicHealth always stores temporal data as datetime; when relevant, tables also include a precision column to document with what precision the date value is known.

We make our best effort to estimate the date at which an event happens, and also describe the uncertainty in precision of our estimate. For example, we may provide a precision of month for an event in our data set. That indicates that we are highly confident that the event in question occurred within the given month and year, but are not confident on the day. In these cases, in order to provide a valid date, we impute the middle of the relevant time period. For example: if the precision is year, we impute July 2, if the precision is month we impute the 16th (except 15th for February), and if the precision is day we impute noon.

If the datetime is not imputed to the middle of the precision time period, this is often due to relative time statements in the record. For example, if the record mentions a patient having a procedure “around 6 weeks ago”, the date will be exactly 6 weeks prior to the visit in which the statement was recorded, but the precision will only be month, because we do not have certainty to the granularity of the day in which the procedure happened. Similarly, an event reported by a patient 6 months before September 4th will map to March 4th with precision MONTH (not March 15th).

Examples of date imputation:

Medical record text Precision Dataset datetime value
7/2/2019 DAY 2019-07-02T12:00:00Z
July 2019 MONTH 2019-07-16T12:00:00Z
February 2019 MONTH 2019-02-15T12:00:00Z
2018 YEAR 2018-07-02T12:00:00Z
6 weeks ago (from 2019-11-05) MONTH 2019-10-01T12:00:00Z

Datetime Certainty

For some clinical variables in cohort tables (e.g. cohort_drug_era, cohort_condition), the precise timing of an event may be unavailble from a patient’s medical records. An indication of certainty is provided in the start_date_known and (when applicable) end_date_known fields.

When a given clinical event is stated with a specific onset date in records, start_date_known will be TRUE. When a specific onset date is not provided, the value in the start_date field represents the first visit where the event was noted; in this case, start_date_known will be FALSE. In the latter case, the start_date can be interpreted as the earliest affirmative mention of the event available in the medical record, but note that the event may have begun or occurred earlier.

Similarly, when the start_date_known field is FALSE, the value in end_date represents the most recent visit in which the clinical event or era was known to be present and/or ongoing. In some instances, eras with a FALSE end_date_known can be assumed to be ongoing to the present day, subject to clinical and analytical reasoning.

Onboard and Withdrawal Dates

The person table contains information about a patient’s entry into the cohort and, if applicable, the date of their voluntary withdrawal from the cohort. All patients should have an onboarding date. Data collected before this date is considered retrospective, whereas data from after this date is prospectively collected.

Death Dates

Death dates are available in the person table when a person is known to have passed away. These data are derived from a combination of sources including: the Social Security Administration Limited Access Death Master File, deaths reported in the EMR, obituary data, and direct family or caregiver reports. If the death date is null, we believe the patient to still be alive at the time of the dataset creation.