Similarity-Orchestrated Multi-modal Architecture — a federated agentic system for privacy-preserving patient similarity networks in neurological disease.
Patient Similarity Networks offer a mathematically principled approach to precision medicine. Over a decade of research, 1,800+ citations on the foundational SNF paper — yet no production clinical deployment exists. The barriers aren't algorithmic. They're infrastructural.
Complete multi-modal profiles are rare. Most patients have incomplete data across modalities.
Patient records siloed across institutions with incompatible EHR systems and clinical vocabularies.
HIPAA and GDPR prohibit centralization of protected health information.
Existing PSNs treat patients as static snapshots, losing critical progression information.
No mechanism bridges population-scale molecular atlases with patient-level clinical data.
No existing system simultaneously addresses all five barriers. SOMA's novelty is the specific integrated architecture that makes PSNs clinically deployable for the first time.
Federated privacy + Blind Handshake + temporal velocity + atlas integration + LLM edge harmonization + missing modality imputation + continual learning with human validation gates.
Disease trajectories as first-class graph properties. Not just where a patient is in disease space, but where they're going and how fast. Order-of-magnitude separation between progression archetypes.
Four-stage federated onboarding with model extraction mitigations. Fixed public anchors, bounded queries, DP noise. Zero patient data leaves the new node. Novel — no prior art exists.
Two-tier aggregate protocol achieves clinically meaningful similarity (C̄₁₀ > 0.7) at privacy budgets (ε = 2) where per-embedding noise destroys all utility. Noise reduction >200× at N=200.
Cell-type perturbation signatures via Pearson correlation against Allen Institute BKP's 5,200-cluster taxonomy. Pluggable Reference Atlas Layer for current and future atlases.
Designed from five first principles: distribute the hardest problem, transmit representations not data, encode velocity not just position, bridge molecular atlases to patient data, learn continuously and validate rigorously.
Autonomous Clinical Edge Agents at hospitals perform LLM-powered data harmonization, embed patients into 128-d privacy-protected vector space. Zero PHI egress.
Cloud-based knowledge graph with temporal velocity encoding. Cross-institutional SNF fusion, matrix completion for missing modalities, closed-loop continual learning.
Four-stage onboarding protocol for new nodes. Calibrates embedding spaces, validates data quality, and establishes trust — all without exposing patient data. Synaptogenesis.
Encrypted mTLS-authenticated channels. Only DP-noised embeddings cross boundaries. Offline queue for network interruptions. Payload-size enforcement.
All results are on synthetic data. Clinical validation on real neurological patient data is the next phase.
5 hospitals, 830 patients, 3 subtypes. Hybrid protocol: C̄₁₀ = 0.75 at ε = 2, 0.92 at ε = 5. Per-embedding: C̄₁₀ = 0.33 (random) at all ε.
60 patients, 5 timepoints, 3 archetypes. Velocity magnitudes: rapid = 0.915, slow = 0.237, stable = 0.071. Order-of-magnitude separation.
80 patients, 3 clusters, 3 partial modalities. No single modality resolves all clusters. After 10-iteration SNF, all three fully resolved.
100 patients, 31% complete, 30–40% per-modality missingness. Cosine similarity of imputed vs. ground-truth: ~0.955 across all modalities.
50 patients, 3 subtypes, simulated 5,200-cell atlas. KMeans ARI = 1.000. Perfect subtype recovery.
3 hospitals, sequential training. Cohesion 0.993 with replay vs. 0.986 without. Catastrophic forgetting mitigated.
SOMA represents the research arm of Curadai's vision — applying federated, privacy-preserving AI architecture to precision medicine. The same architectural primitives that power SOMA inform everything we build.