Integration of multi-omics datasets enables molecular classification of COPD
Chronic obstructive pulmonary disease (COPD) is an umbrella diagnosis caused by a multitude of underlying mechanisms, and molecular sub-phenotyping is needed to develop molecular diagnostic/prognostic tools and efficacious treatments.
The objective of these studies was to investigate whether multi-omics integration improves the accuracy of molecular classification of COPD in small cohorts.
Nine omics data blocks (comprising mRNA, micro RNA, proteomes and metabolomes) collected from several anatomical locations from 52 female subjects were integrated by similarity network fusion (SNF). Multi-omics integration significantly improved the accuracy of group classification of COPD patients from healthy never-smokers and from smokers with normal spirometry, reducing required group sizes from n=30 to n=6 at 95% power. Seven different combinations of four to seven omics platforms achieved >95% accuracy.
For the first time, a quantitative relationship between multi-omics data integration and accuracy of data-driven classification power has been demonstrated across nine omics data blocks. Integrating five to seven omics data blocks enabled 100% correct classification of COPD diagnosis with groups as small as n=6 individuals, despite strong confounding effects of current smoking. These results can serve as guidelines for the design of future systems-based multi-omics investigations, with indications that integrating five to six data blocks from several molecular levels and anatomical locations suffices to facilitate unsupervised molecular classification in small cohorts.