Timing and Date Data
timing.Rmd
When analyzing real world data derived from medical records collected during routine patient care, there may be limitations the precision or certainty of information about the timing of an event. Below, we discuss some considerations for the analysis of temporal data.
Datetime Precision
In medical records, dates may exist at varying levels of precision. For example, a provider note in a medical record may indicate only the year or month in which a diagnosis occurred - “Patient diagnosed with UC in 2008” - whereas for some data types (labs, vital signs) the full date is always available. In order to consistently record this data, PicnicHealth always stores temporal data as datetime; when relevant, tables also include a precision column to document with what precision the date value is known.
We make our best effort to estimate the date at which an event happens, and also describe the uncertainty in precision of our estimate. For example, we may provide a precision of month for an event in our data set. That indicates that we are highly confident that the event in question occurred within the given month and year, but are not confident on the day. In these cases, in order to provide a valid date, we impute the middle of the relevant time period. For example: if the precision is year, we impute July 2, if the precision is month we impute the 16th (except 15th for February), and if the precision is day we impute noon.
If the datetime is not imputed to the middle of the precision time
period, this is often due to relative time statements in the record. For
example, if the record mentions a patient having a procedure “around 6
weeks ago”, the date will be exactly 6 weeks prior to the visit in which
the statement was recorded, but the precision will only be month,
because we do not have certainty to the granularity of the day in which
the procedure happened. Similarly, an event reported by a patient 6
months before September 4th will map to March 4th with precision
MONTH
(not March 15th).
Examples of date imputation:
Medical record text | Precision | Dataset datetime value |
---|---|---|
7/2/2019 | DAY | 2019-07-02T12:00:00Z |
July 2019 | MONTH | 2019-07-16T12:00:00Z |
February 2019 | MONTH | 2019-02-15T12:00:00Z |
2018 | YEAR | 2018-07-02T12:00:00Z |
6 weeks ago (from 2019-11-05) | MONTH | 2019-10-01T12:00:00Z |
Datetime Certainty
For some clinical variables in cohort tables
(e.g. cohort_drug_era
, cohort_condition
), the
precise timing of an event may be unavailble from a patient’s medical
records. An indication of certainty is provided in the
start_date_known and (when applicable)
end_date_known fields.
When a given clinical event is stated with a specific onset date in records, start_date_known will be TRUE. When a specific onset date is not provided, the value in the start_date field represents the first visit where the event was noted; in this case, start_date_known will be FALSE. In the latter case, the start_date can be interpreted as the earliest affirmative mention of the event available in the medical record, but note that the event may have begun or occurred earlier.
Similarly, when the start_date_known field is FALSE, the value in end_date represents the most recent visit in which the clinical event or era was known to be present and/or ongoing. In some instances, eras with a FALSE end_date_known can be assumed to be ongoing to the present day, subject to clinical and analytical reasoning.
Onboard and Withdrawal Dates
The person
table contains information about a patient’s
entry into the cohort and, if applicable, the date of their voluntary
withdrawal from the cohort. All patients should have an onboarding date.
Data collected before this date is considered retrospective, whereas
data from after this date is prospectively collected.
Death Dates
Death dates are available in the person table when a person is known to have passed away. These data are derived from a combination of sources including: the Social Security Administration Limited Access Death Master File, deaths reported in the EMR, obituary data, and direct family or caregiver reports. If the death date is null, we believe the patient to still be alive at the time of the dataset creation.