Principal Biostatistician Translational Immunology Institute/Singhealth DukeNUS Academic Medical Centre Singapore, Singapore
Abstract Text: The immune system erroneously attacks and damages healthy tissues in autoimmune diseases like Systemic Lupus Erythematosus (SLE). High-dimensional mass cytometry (MC) measures over 50 proteins in single cells and is a powerful technology to profile perturbations of the immune system and their impact on patients’ health. To discover patterns in MC data that enable biomarker discovery and lay the foundation for diagnostic and prognostic applications, the Extended Polydimensional Immunome Characterization (EPIC) data mining platform was introduced. Here, we present two extensions. First, a module dubbed group similarity analysis (GSA) facilitates pattern discovery in clustering outputs. Upon segmenting aggregated cytometry data, the proportions of cell clusters give rise to immune signatures that are visualised by Uniform Manifold Approximation and Projection (UMAP). Subsequent silhouette analysis of the two-dimensional UMAP projections enables quantitative comparison of batch effects and biological stratification. Second, we apply supervised learning for biological sample classification. We developed a leave-one-batch-out training strategy to estimate the classification accuracy where new data are first mapped to a trained self-organising map (SOM) to assign them to annotated cell populations. Subsequently, their frequencies predict the health status of the corresponding sample. To illustrate the analytics pipeline, we used a dataset of 133 samples from SLE patients and healthy controls acquired in 10 independent experimental batches. Unsupervised GSA and supervised learning demonstrate that immune signatures can stratify blood samples into four distinct groups. This work paves the way for future applications where cytometry data mining provides decision support for the medical practitioner.