12th Apr 2019
Data linkage is an important tool for increasing the value of existing longitudinal studies. Linkage to multiple data sources can provide a low-cost, efficient means of collecting extensive and detailed data on cross-sectoral services, society, and the environment, as well as augmenting direct data collection through linkage with biological samples, social media and other digital sources. These data can be used to supplement traditional cohort studies, or to create population-level electronic cohorts generated from administrative data. Such administrative data cohorts offer the ability to answer questions that require large sample sizes or detailed data on heard-to-reach populations, and to generate evidence with a high level of external validity and applicability for policy-making. There is increasing interest in using these two models of data collection in conjunction, combining population-level administrative data with detailed attribute data collected directly from participants, in order to provide a deeper insight into what determines our health.
Lack of access to unique or accurate identifiers means that linkage is not always straightforward. Errors occurring during linkage (false-matches and missed-matches) can lead to substantial bias in results based on linked data. This issue is compounded by difficulties in evaluating linkage quality or determining the potential impact of errors on results due to the separation of linkage from analyses of inked data. This talk will give an overview of the opportunities, challenges and methods for using data linkage in cohort studies.