Data Scientist
MD Anderson Cancer Center
Ph.D. SUNY at Buffalo
Born September, 1988

S. Mostafa Sarayi

(Fardad)

I am a Data Scientist at MD Anderson Cancer Center, formerly a Senior Statistical Analyst, with a background in Statistics, Data Science, Applied Mathematics (as a lecturer), Health Sciences, Bioengineering, and Mechanical Engineering. My passion lies in developing and deploying data-driven solutions for real-world problems, particularly in tech industry, healthcare analytics, computer vision, Nartural and Large Language modeling. I specialize in machine learning, statistical analysis, image processing, computer modeling, numerical simulation, and optimization.

Go to Projects section
A key focus of my work involves handling complex medical data challenges, including highly imbalanced datasets in the cancer domain and across various modalities such as imaging, clinical records, and multi-source health data. I work extensively with cancer-related datasets, including national surveys, health claims data, cancer registries, clinical trials, and observational studies, ensuring that predictive models remain robust even in data-scarce or highly skewed scenarios.

At MD Anderson, I develop predictive models for cancer incidence, mortality, and risk assessment using machine learning and deep learning techniques. My work also involves generating microsimulation models to evaluate the cost-effectiveness and efficacy of cancer screening and treatment strategies. By integrating insights from diverse data sources, I contribute to evidence-based decision-making in oncology research and patient care.

Previously, I served as a part-time technical consultant for Medtronic Co., where I led the development of numerical simulations and data-driven optimization models to enhance the efficacy of newly developed endovascular intervention devices.

My experience in medical image analysis includes working with various imaging modalities such as digital images, CTA, nCCT, MRI, and Time-of-Flight MRA. I have developed automated segmentation, registration, feature extraction, statistical shape analysis, and machine learning models, contributing to improved diagnostic accuracy and patient outcomes. In addition to imaging, I have worked extensively with multi-institutional and nationwide patient datasets, including hospital records, insurance claims, clinical trials, and observational studies. My work in these areas integrates NLP, statistical modeling, feature engineering, and machine learning to extract meaningful insights that drive innovation in healthcare.

My research interests span healthcare AI, medical imaging, computational medicine, and big data in healthcare. I am particularly interested in improving model generalizability when working with imbalanced datasets, integrating multi-modal data sources, and advancing AI-driven solutions for oncology and medical decision-making.

Some of my projects

Click to see more projects

Listed are some of my projects for most of which there are publications avalilable (please click on titles or pictures).

3D image processing: Segmentation, registeration, and I/O program for feature engineering

We analyzed CTA and nCCT from vasculature of stroke patients. Segmentation was completed to generated 3D geometry of the vessels. Rigid and non-rigid registeration to transform cases into the same coordinate system. An symmetry plane and projection alogorithm was developed. An in-house I/O program for feature engineering was developed to extract various features.

Outcome prediction of Stroke treatment: Statistics, feature Selection, GMM, ML predictions

Statistical analysis and geometric morphometrics were used to explore differences in vascular features between successfull and unsuccessful outcomes. Feature reduction and selection was performed on +5200 features and 14 features were selected as predictors. Machine learning was used to train predictive models of outcome and high accuracy (accuracy = 0.98) was achieved for the best model.

A Structured and Sparse Neural Network and Its Matrix Calculations Algorithm

A new sparse and structured Neural Network is presented. We introduce a nonsymmetric, tridiagonal matrix with offdiagonal, offset sub, and super-diagonals entries, besides new algorithms for its [pseudo]inverse and determinant calculations. A decomposition for lower triangular matrices is developed to factorize a matrix into a set of matrices that their inverse are straighforward to calculated.

Predicting ICU length of stay: ML predictions, statistical analysis, feature importance and reduction

~30,000 patient data were analyzed. Initial predictors were demographics, admission information, Medicare entitlement, and diagnoses( quantified as Charlson Comorbidity Index, overall & Present on Admission). First, Statistical analysis and feature selections were completed. Next, 7 ML models, including a DL algortihm, were developed and high prediction accuracy achieved.

Automated Cerebral vessel segmentation from MR Imaging using CNN

Completed a comparative study between the two context-based 3D CNNs for brain vessel segmentation from MRA. We trained and tested the two models on a dataset of 51 retrospectively collected TOF-MRA images from patients with ICAD. Furthermore, for accurate ground truth generation in small arteries, carotid siphon, and stenotic regions, we employ high-resolution black-blood MRI.

3D virtual wire guided and wireless catheter deployment in modeled real patients

Microcatheter deployment with and without guidewire were simulated using a hybrid FEA- SPH technique. Microcatheters were given a multilayer structure and 3D model was reconstructed from CT images. The fragmentation percentage (FP) of the clot was quantified based on the ratio of converted to non-converted elements in the FEA-SPH method under catheter loading.

Virtual coiling in modeled patient specific intracranial aneurys and post treatment hemodynamics

To aid in predicting and improving treatment outcome of endovascular coiling of intracranial aneurysms, simulation of patient-specific coil deployment should be both accurate and fast. We developed a fast virtual coiling algorithm and compared it to FEA and preshape ignoring models. Pre and post treatment CFD of blood flow was modeled to investigate performance of the device.

3D virtual simulation stent retriever thrombectomy in modeled real stroke patients

3D simulations of mechanical thrombectomy in modeled patient specific vasculature in presence of the patient's occluding clot we compeleted. The virtual interventions were able to predict the outcome of treatment with high accuracy for both successful and unsuccessful cases. This is the first time that the interaction between blood clot, stent, vessel, and Microcatheter were able to be modeled.

3D modeling of blood clot-stent interaction during mechanical Thrombectomy

I introduced and tested a novel method for modelling blood clots and stent retriever thrombectomy devices which are both practical and accurate in simulating and predicting their behaviour. This method couples and implements both grid-based FEA and meshless SPH to accuractly capture the interaction behavior including disection and fragmentation of clots during mechanical thrombectomy.

Education

June 2018 - August 2022

Ph.D.

Mechanical Engineering, Neurosurgery, Bio-Engineering, and Biology Departments

Data Science, Medical Image Processing and Analysis, Numerical Simulation, Mechanical Design and Analysis, Biomechanics

September 2011 - September 2014

M.S.C.

Mechanical Engineering, Applied Mathematics

Algorithm developement, Applied Mathematics, Programming, Computation and Numerical Methods, Time Series Analysis, Biomechanics, Control and Robotics

September 2006 - August 2011

B.S.

Department of Mechanical Engineering, Applied Mechanics

Learned different programming languages, commercial softwares, and principles of Applied Mathematics and Mechanical Engineering. Familiarity with Bioengineering