SOMA

Federated clinical twin finder for neurology. Matches patients by disease trajectory — not just snapshots — across institutions with privacy-preserving similarity. Validated on real ADNI data.

Patent Pending · Sebastian T. Muah & Krutika Khinvasara, PhD

See Validated Results ← Curadai Home
3,700
ADNI subjects validated
8x
Velocity separation
C̄₁₀ = 0.91
Embedding quality
128-d
Embedding space
The Problem

Finding your patient's clinical twin shouldn't require centralizing data.

Neurologists make better decisions when they can find patients with similar disease trajectories across institutions. But privacy regulations, data fragmentation, and missing temporal context have kept patient similarity networks out of the clinic.

🏥

Data Fragmentation

Patient records siloed across institutions with incompatible EHR systems and clinical vocabularies.

🔒

Privacy Regulation

HIPAA and GDPR prohibit centralization of protected health information across institutions.

⏱️

Static Snapshots

Existing approaches treat patients as frozen in time, losing critical trajectory and progression information.

📊

Missing Modalities

Complete multi-modal profiles are rare. Most patients have incomplete data across clinical, imaging, and genomic modalities.

Validated on Real Data

What we proved on 3,700 ADNI subjects.

Every claim below is backed by real patient data from the Alzheimer's Disease Neuroimaging Initiative — not synthetic benchmarks. Seven core assumptions tested. Here's what held up.

Embedding Quality Validated

Held-out neighborhood coherence C̄₁₀ = 0.91 ± 0.02 across n = 1,125 subjects. 98.8% PCA variance captured by projection, confirming the embedding space preserves clinically meaningful structure.

Temporal Velocity Validated

The standout result. 8x velocity separation between Dementia and Cognitively Normal groups (Dem/CN = 8.05x, Dem/MCI = 3.15x). Disease trajectory is a first-class signal, not noise. n = 1,125 subjects.

SNF Fusion Validated

Fused multi-modal ARI = 0.125, more than 2x the best single-modality baseline (0.062). No single modality resolves all patient subtypes — fusion does. n = 3,700 subjects.

Within-Domain Imputation Validated

Within-domain reconstruction r = 0.936. Missing data within a modality can be recovered with high fidelity, enabling patient matching even with incomplete records.

Secure Aggregation Partial

Per-embedding differential privacy is fundamentally broken (requires ε ≥ 20 for any utility). But secure aggregation protocol rescues privacy-preserving similarity at practical noise levels. Architecture redesigned accordingly.

Atlas Integration Partial

Brain tissue signatures work well (C̄₁₀ = 0.779 across 13 GTEx brain regions). Blood-derived signatures do not transfer (C̄₁₀ = 0.386). Atlas grounding is viable for research cohorts with tissue access, not peripheral blood.

Architecture

Three-layer hub-and-spoke topology.

Hospital nodes embed patients locally. The central brain fuses multi-modal similarity with temporal velocity. Blind Handshake onboards new institutions without exposing patient data.

SOMA Central Brain SNF Fusion · Velocity Encoding · Trajectory Matching Secure Aggregation · Embeddings only Secure Aggregation · Embeddings only Blind Handshake (onboarding) 🏥 Hospital A Edge Node 🏥 Hospital B Edge Node 🏥 Hospital C Edge Node 🔬 Research Lab A Edge Node 🧬 Genomics Center Edge Node 🧠 New Institution Blind Handshake... Active connection Onboarding (forming) Hospital Research
🏥

Edge Nodes

Hospital-side agents embed patients into 128-d vector space locally. Clinical, imaging, genomic, and biomarker modalities. Zero PHI egress — only embeddings leave the institution.

🧠

Central Brain

Fuses multi-modal similarity via SNF with temporal velocity encoding. Matches patients by where they're heading, not just where they are. Trajectory-aware clinical twin finding.

🤝

Blind Handshake

Four-stage onboarding for new institutions. Calibrates embedding spaces and validates data quality without exposing patient data. Novel protocol — no prior art exists.

🔏

Secure Aggregation

Two-tier aggregate protocol preserves privacy at practical noise levels. Per-embedding DP fails — SOMA's architecture routes around this with institution-level aggregation.

What's Next

From validation to clinical utility.

The core embedding and trajectory matching pipeline is validated. Next: federated experiments across simulated multi-site deployments, differential privacy characterization at scale, and drift detection for longitudinal monitoring.

Federated DP Characterization

Mapping the privacy-utility frontier across institution counts, cohort sizes, and noise budgets to establish deployment guidelines.

Drift Detection

Temporal calibration for longitudinal deployments — detecting when patient populations shift and model recalibration is needed.

Clinical Partner Pilots

Moving from ADNI validation to real multi-site deployment with clinical partners in neurology and memory care.

A Curadai Research Initiative

SOMA represents the research arm of Curadai's vision — applying federated, privacy-preserving AI to precision medicine. Validated on real neurological data, with the architectural primitives that inform everything we build.

Visit Curadai ↗ Get in Touch