LEXINGTON, Ky. (Dec. 1, 2021) — Work by a group of researchers at the University of Kentucky’s Sanders-Brown Center on Aging was recently published in Genes. The article looks at the use of data mining and machine learning in research.
The Alzheimer’s Disease Neuroimaging Initiative (ADNI) contains extensive patient measurements (magnetic resonance imaging (MRI), biometrics, RNA expression, etc.) from Alzheimer’s disease cases and controls that have recently been used by machine learning algorithms to evaluate Alzheimer’s disease onset and progression. While using a variety of biomarkers is essential to Alzheimer’s disease research, highly correlated input features can significantly decrease machine learning model generalizability and performance. Additionally, redundant features unnecessarily increase computational time and resources necessary to train predictive models.
Justin Miller, Ph.D., assistant professor in the UK College of Medicine, directed this work through a collaboration with Mark Ebbert, Ph.D., assistant professor in the UK College of Medicine, and staff scientists Erik Huckvale and Matthew Hodgman. Together, they used 49,288 biomarkers and 793,600 extracted MRI features to assess feature correlation within the ADNI dataset to determine the extent to which this issue might impact large scale analyses using these data. Miller says through this work they found that greater than 90% of the biomarkers, gene expression data, and MRI data included in the ADNI dataset are very highly correlated with at least one other datatype, which could provide unforeseen challenges in using machine learning to identify patterns across the diverse data that are available in that dataset.
In this publication, Miller and his colleagues provide mappings of the highly correlated features so that future studies can consider this feature correlation and improve machine learning accuracy and efficiency in Alzheimer’s disease research.
“Feature correlation has always been an issue in large datasets, but it was previously unknown the extent to which this issue permeated the Alzheimer’s Disease Neuroimaging dataset,” said Miller. “This research will help improve data mining accuracy and efficiency in the ADNI dataset. Machine learning is a promising avenue of research to identify patterns that can one day improve patient care. This research lays the groundwork for those future analyses.”
This work was supported by the BrightFocus Foundation under Award Number A2020118F. Research reported in this publication was also supported by the National Institute of Aging of the National Institutes of Health under Award Numbers R01AG046171, RF1AG051550 and 3U01AG024904-09S4. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
The University of Kentucky is increasingly the first choice for students, faculty and staff to pursue their passions and their professional goals. In the last two years, Forbes has named UK among the best employers for diversity, and INSIGHT into Diversity recognized us as a Diversity Champion four years running. UK is ranked among the top 30 campuses in the nation for LGBTQ* inclusion and safety. UK has been judged a “Great College to Work for" three years in a row, and UK is among only 22 universities in the country on Forbes' list of "America's Best Employers." We are ranked among the top 10 percent of public institutions for research expenditures — a tangible symbol of our breadth and depth as a university focused on discovery that changes lives and communities. And our patients know and appreciate the fact that UK HealthCare has been named the state’s top hospital for five straight years. Accolades and honors are great. But they are more important for what they represent: the idea that creating a community of belonging and commitment to excellence is how we honor our mission to be not simply the University of Kentucky, but the University for Kentucky.