Open science is an important driver for the EHDEN project, which is seeking to address a number of challenges facing health data research, such as the lack of reproducibility and transparency, poor statistical designs and publication bias, by striving for openness and accessibility of the scientific discourse at all levels and all stages of research. Since the start of the project, we have organised and participated in a number of open science events called study-a-thons. This started with the knee replacement study-a-thon which resulted in a publication in The Lancet Rheumatology, but also the OHDSI COVID-19 study-a-thon which resulted in a number of high profile publications in a.o. The Lancet Rheumatology and The Lancet Digital Health.
Since there is still relatively little information available on study-a-thons and how they work in practice, the OHDSI community and members from EHDEN are developing documentation and publications on this topic. In the meantime, this blog aims to provide a quick overview of how OHDSI study-a-thons work in practice, using our collaboration with sister project IMI PIONEER as an example.
Architecture of a study-a-thon
A study-a-thon is a focused multi-day meeting to generate medical evidence on a specific topic across different countries and health care systems. In a study-a-thon, three groups work together to achieve the end result, which should be a faster and significant contribution to medical research and medical practice. The three groups participating in a study-a-thon are a literature review and clinical research group, a phenotype development group and a study execution group.
- The literature review and clinical research group focuses on defining a research question that is relevant, specific and timely, given the state-of-the-art in medical research and the potential data that participants can contribute to the study-a-thon. This generally follows one of the three types of research questions (as further explained in the Book of OHDSI Part IV):
- Characterisation: characterising populations through the use of descriptive statistics to generate hypotheses about the determinants of health and disease, and to understand clinical outcomes of specific groups in the population.
- Population-Level Estimation: the estimation of average causal effects of exposures (e.g. medical interventions such as drug exposures or procedures) on health outcomes of interest (e.g. the safety or effectiveness of drugs or other treatments).
- Patient-Level Prediction: the prediction of future health outcomes from existing patient-level data to support clinical decision making and risk evaluation, and the validation of such prediction models.
- The phenotype development group translates the research question into a number of specific cohort definitions, which can then be used to execute the study. This group needs to balance the −often elaborate− wishes of the clinical research group in terms of characterising populations and outcomes with the actual data available in the participating databases.
- The study execution group translates the research question into code, building on the OHDSI analytics tools and specifically the method library, and using the concept and cohort definitions developed by the phenotype development group. This task also requires recruitment and coordination of all the databases participating in the study-a-thon. Often, study-a-thons attract data sources in OMOP that are outside of the initial organising group but who are also interested in the question and happy to participate with the data they have access to.
The prostate cancer study-a-thon
To provide a concrete example, let’s take a look at the recent PIONEER / EHDEN / OHDSI study-a-thon on prostate cancer:
- The decision was made to focus on two research questions. Firstly, the clinical characterisation of prostate cancer patients managed with the treatment paradigm of watchful waiting. Secondly, the prediction of outcomes that these patients have given their co-morbidities.
- Phenotype definitions were built to support execution of the characterisation study in practice, with cohort definitions such as ‘all patients with initial confirmed diagnosis of prostate cancer’, ‘patients treated under active surveillance’, ‘patients treated under watchful waiting’, etc.
- A study package was created to execute this study in a number of large American and European claims’ databases, but also in electronic health records (EHRs) of multiple participating hospitals (e.g. from New York and Bordeaux) and registries (such as the Netherlands Cancer Registry and the U.S. Veterans Administration).
A study-a-thon is a great way to quickly build momentum within an open science environment and to work towards outcomes of concrete medical and/or scientific value. Because of its open and multidisciplinary nature, it is especially interesting for precompetitive and multidisciplinary projects such as PIONEER, one of the BD4BO (Big Data for Better Outcomes) sister projects of EHDEN. Furthermore, the study packages and mapped datasets are a persistent legacy for future research studies and analysis.
What we learned about the technical process of running network studies
Generating medical evidence is a key goal for EHDEN, and this study-a-thon is a prime example of the type of research that we want to foster and enable using the EHDEN network. So, what can we learn from this endeavour for future studies?
One of the key issues during the study-a-thon, but also in the aftermath, is that although the OMOP CDM and the OHDSI tools are very powerful, they also have a steep learning curve. Study package developers need to have knowledge about the OMOP CDM, know the R programming language, familiarise themselves with a number of key OHDSI packages such as CohortDiagnostics, and have enough data science and epidemiology knowledge to be able to ensure that the right computations are done and that they are implemented in a way that the whole package can finish in a couple days even when run on a dataset with hundreds of millions of patients. This narrows down the number of candidates for taking this role substantially.
Part of the problem was also that the type of characterisation analysis, with a rebased cohort index date, is non-standard and for instance not supported in ATLAS by default. Nevertheless, it’s clear that further refactoring and development of the analytics elements to a level where ‘standard’ studies can be coded and run by people who are not proficient in R would be a big plus.
In addition, the workflow around study execution, communication with the data partners and submission of results could do with some automation and record-keeping. In EHDEN Work Package 4, we are currently working with several partners, including Odysseus, The Hyve, University of Aveiro, University of Tartu and Erasmus Medical Center, to define these workflows and extend the EHDEN Portal to be able to deal with some of these data management chores.
Overall, however, we are very excited to be working with a global community like OHDSI and are confident that study-a-thons can be used to further develop and promote open science, and generate medical evidence to inform and benefit the health of everyone!