The Methodology Core is a centralized resource in the Section that provides direction and support for study design, data management, programming, computing, forms development, statistical analysis, psychometric testing, training and supervision of interviewers, and the education of post-doctoral fellows and other trainees. The Core serves investigators, fellows and other trainees in residence within the Section.
With the availability of big data, novel methods are required to harness these data for clinical research. As part of the Informatics for Integrative Biology and the Bedside (i2b2) project, investigators in our group are developing methods to effectively harness electronic medical records (EMR) for research. In this project we collaborate with bioinformaticians, database programmers and biostatisticians to develop the tools needed to accomplish this goal. We have applied tools such as natural language programming (NLP) to extract RA specific information such as bone erosions from free text notes. We have demonstrated proof of concept for this approach, defining an RA cohort with the aid of NLP, linking to our institutional specimen biorepository, with many studies underway. Active areas of research include identifying genetic predictors for outcomes such as cardiovascular disease and TNF response in RA.
One approach for utilizing EMR data for research is to create phenotype cohorts for clinical and genetic association studies. Recently, a “how to” paper on using EMR data to develop robust phenotype algorithms to create EMR based cohorts was published. In these algorithms, we apply NLP, which enabled the extraction of data that could only previously be extracted through manual medical record review. This study outlines the methods for phenotyping using EMR data and NLP developed as part of the i2b2 project. The “Development of phenotype algorithms using EMR and incorporating NLP” published in BMJ can be accessed here.