Federated clinical twin finder for neurology. Matches patients by disease trajectory — not just snapshots — across institutions with privacy-preserving similarity. Validated on real ADNI data.
Neurologists make better decisions when they can find patients with similar disease trajectories across institutions. But privacy regulations, data fragmentation, and missing temporal context have kept patient similarity networks out of the clinic.
Patient records siloed across institutions with incompatible EHR systems and clinical vocabularies.
HIPAA and GDPR prohibit centralization of protected health information across institutions.
Existing approaches treat patients as frozen in time, losing critical trajectory and progression information.
Complete multi-modal profiles are rare. Most patients have incomplete data across clinical, imaging, and genomic modalities.
Every claim below is backed by real patient data from the Alzheimer's Disease Neuroimaging Initiative — not synthetic benchmarks. Seven core assumptions tested. Here's what held up.
Held-out neighborhood coherence C̄₁₀ = 0.91 ± 0.02 across n = 1,125 subjects. 98.8% PCA variance captured by projection, confirming the embedding space preserves clinically meaningful structure.
The standout result. 8x velocity separation between Dementia and Cognitively Normal groups (Dem/CN = 8.05x, Dem/MCI = 3.15x). Disease trajectory is a first-class signal, not noise. n = 1,125 subjects.
Fused multi-modal ARI = 0.125, more than 2x the best single-modality baseline (0.062). No single modality resolves all patient subtypes — fusion does. n = 3,700 subjects.
Within-domain reconstruction r = 0.936. Missing data within a modality can be recovered with high fidelity, enabling patient matching even with incomplete records.
Per-embedding differential privacy is fundamentally broken (requires ε ≥ 20 for any utility). But secure aggregation protocol rescues privacy-preserving similarity at practical noise levels. Architecture redesigned accordingly.
Brain tissue signatures work well (C̄₁₀ = 0.779 across 13 GTEx brain regions). Blood-derived signatures do not transfer (C̄₁₀ = 0.386). Atlas grounding is viable for research cohorts with tissue access, not peripheral blood.
Hospital nodes embed patients locally. The central brain fuses multi-modal similarity with temporal velocity. Blind Handshake onboards new institutions without exposing patient data.
Hospital-side agents embed patients into 128-d vector space locally. Clinical, imaging, genomic, and biomarker modalities. Zero PHI egress — only embeddings leave the institution.
Fuses multi-modal similarity via SNF with temporal velocity encoding. Matches patients by where they're heading, not just where they are. Trajectory-aware clinical twin finding.
Four-stage onboarding for new institutions. Calibrates embedding spaces and validates data quality without exposing patient data. Novel protocol — no prior art exists.
Two-tier aggregate protocol preserves privacy at practical noise levels. Per-embedding DP fails — SOMA's architecture routes around this with institution-level aggregation.
The core embedding and trajectory matching pipeline is validated. Next: federated experiments across simulated multi-site deployments, differential privacy characterization at scale, and drift detection for longitudinal monitoring.
Mapping the privacy-utility frontier across institution counts, cohort sizes, and noise budgets to establish deployment guidelines.
Temporal calibration for longitudinal deployments — detecting when patient populations shift and model recalibration is needed.
Moving from ADNI validation to real multi-site deployment with clinical partners in neurology and memory care.
SOMA represents the research arm of Curadai's vision — applying federated, privacy-preserving AI to precision medicine. Validated on real neurological data, with the architectural primitives that inform everything we build.