Sarvar Khamidov

Hey, I'm Sarvar — an AI engineer and data scientist based in NYC (open to relocate).

Most of my career has been in pharma and biotech. At AbbVie I built agentic systems for statistical workflow automation — LLM pipelines, document Q&A, and a full-stack analysis platform used by biostatisticians day-to-day.

Before that, at NYU Langone I built modeling pipelines for wearable physiological time-series and developed early-signal classifiers for cognitive decline. At Regeneron I built diagnostic software for clinical data quality review.

I've also done ML research at the Feng Lab, evaluating CNNs and multimodal models like CLIP and Qwen-VL on image classification benchmarks.

Outside of those roles, I founded AcadyLearn — an AI app that turns uploaded documents into quizzes and flashcards, built end-to-end with LLM workflows and AWS infrastructure.

I also built an agentic workspace for public health data exploration at a hackathon — LLM-driven loops for profiling, cleaning, and analysis with CDC/Socrata integration. More projects on GitHub.

AI Data Scientist

AbbVie

Analyzed longitudinal clinical data from a Phase 1 program to characterize treatment-response trajectories. Built agentic systems for statistical workflow automation, including a full-stack data-analysis platform with LLM tool-use loops and an internal AI tool that automated multi-step statistical support tasks. Partnered with statisticians to turn recurring support requests into reusable AI workflows for retrieval, document Q&A, and code generation.

Data Scientist

NYU Langone

Built end-to-end modeling pipelines for wearable physiological time-series data. Developed early-signal classification models of cognitive decline from biomarkers and improved study-group comparability through matching and multivariable adjustment.

Founder

AcadyLearn

Built and shipped a production AI application that transforms uploaded documents into quizzes and flashcards. Designed end-to-end LLM workflows with asynchronous job orchestration, fault-tolerant background processing, and cloud infrastructure on AWS.

ML Research Assistant

Feng Lab

Evaluated lightweight CNNs and ResNet architectures against multimodal models including CLIP and Qwen-VL on TinyImageNet. Fine-tuned CLIP ViT-B/32 for a 10% accuracy gain over zero-shot baselines.

Data Science Intern

Regeneron

Developed diagnostic and validation software to support quality review of clinical study data. Automated checks across database builds and improved efficiency of pre-lock validation workflows.

MS, Biostatistics

New York University

Graduate study in biostatistics with a focus on machine learning and statistical modeling for health data.

BS, Data Science

Pennsylvania State University

Undergraduate study in data science covering algorithms, applied statistics, and computational methods.

Visual Science QA via QLoRA Fine-Tuning of SmolVLM-500M

Can a 500M-parameter vision-language model be meaningfully fine-tuned for K–8 science MCQ under strict hardware constraints (5M trainable params, free-tier T4 GPU)? Applied QLoRA with four staged ablations covering rank, alpha, target modules, and scoring method. Reached 0.875 public leaderboard accuracy, up from a 0.819 baseline — with a key finding that LoRA alpha interacts with module scope: raising alpha hurts attention-only adapters but helps when MLP projections are also included.

Python QLoRA Vision-Language Fine-Tuning

Paper → GitHub →

HealthLab Agent

An agentic workspace for public health data exploration built at a hackathon. Users upload CSVs or pull datasets directly from the CDC/Socrata catalog, then run LLM-driven loops for profiling, cleaning, and analysis — generating charts, statistics, and Markdown reports. Integrates PubMed to surface relevant literature alongside the data findings.

Python FastAPI Next.js Pydantic-AI Agentic

GitHub →

Automated Fetal Health Classification from Cardiotocography

Can ML models reliably flag high-risk pregnancies from CTG signals — reducing reliance on operator interpretation? Compared logistic regression, Lasso, and random forest on 2,126 CTG records (UCI) classifying fetal health as Normal, Suspect, or Pathological. Random forest achieved 94.6% accuracy and 96.9% balanced accuracy on the critical Pathological class, with abnormal short-term heart rate variability as the top predictive feature.

R Classification Random Forest Lasso

View Report →

Depression, Antihypertensives, and Uncontrolled Hypertension

Does antihypertensive medication use change how depression affects blood pressure control? Modeled the interaction using multivariable logistic regression and random forest on 39,467 NHANES participants (2005–2020) with survey-weighted analyses. Among medicated patients, each unit increase in depression score raised the odds of uncontrolled hypertension (aOR: +0.04, 95% CI: 0.02–0.06); logistic regression slightly outperformed random forest on AUC (0.79 vs. 0.78).

R Epidemiology Logistic Regression NHANES

View Manuscript →

Physical Inactivity and Obesity Across U.S. States: A Longitudinal Analysis

Do states with higher inactivity rates see faster obesity growth over time? Applied linear mixed-effects models to 709 state-year observations from CDC BRFSS and Census ACS data (2011–2024). Obesity rose ~0.58 pp/year on average; 67% of total variance was attributable to stable between-state differences (ICC = 0.67). Physical inactivity and poverty both independently predicted higher obesity prevalence after controlling for time trends and clustering.

R Mixed-Effects Models Longitudinal Public Health

View Report →

Large-Scale Psychometric Evaluation of the Big Five Personality Measure

Does the open-source 50-item Big Five test hold up at scale? Evaluated reliability and validity using a 100,000-observation random sample from 1M+ international respondents (2016–2018). Cronbach's alpha ranged from 0.79–0.89 across all five traits, exploratory factor analysis with parallel analysis reproduced the expected five-factor structure, and low inter-trait correlations confirmed divergent validity.

Stata Psychometrics Factor Analysis

View Report →

COVID-19 Case Fatality Rate Analysis

How did COVID-19 mortality risk vary across U.S. states over time? Calculated monthly CFR for all 50 states using NYT surveillance data, then visualized trends with faceted line plots and choropleth maps. CFR declined nationally across 2020–2023, with northeastern states peaking highest early in the pandemic.

R ggplot2 dplyr Maps

View Report →

Music Listening Behaviors as Predictors of Depression

Do the genres you listen to predict how depressed you feel? Built a multiple linear regression model on 728 respondents from the MxMH Survey (Kaggle) using genre frequencies and demographics as predictors. The model explained 18.2% of variance in depression scores; age was protective, Classical listening was positively associated with depression, and Country showed a significant negative association.

R Linear Regression Mental Health

View Report →