SOMA

Similarity-Orchestrated Multi-modal Architecture — a federated agentic system for privacy-preserving patient similarity networks in neurological disease.

Patent Pending · Provisional Application Filed · Sebastian T. Muah & Krutika Khinvasara, PhD

Explore the Research ← Curadai Home
5
Novel contributions
128-d
Embedding space
ε = 2
Privacy budget
C̄₁₀ > 0.7
Similarity preserved
The Problem

Why patient similarity networks haven't reached the clinic.

Patient Similarity Networks offer a mathematically principled approach to precision medicine. Over a decade of research, 1,800+ citations on the foundational SNF paper — yet no production clinical deployment exists. The barriers aren't algorithmic. They're infrastructural.

📊

Missing Modalities

Complete multi-modal profiles are rare. Most patients have incomplete data across modalities.

🏥

Data Fragmentation

Patient records siloed across institutions with incompatible EHR systems and clinical vocabularies.

🔒

Privacy Regulation

HIPAA and GDPR prohibit centralization of protected health information.

⏱️

Missing Temporal Modeling

Existing PSNs treat patients as static snapshots, losing critical progression information.

🧬

Missing Molecular Grounding

No mechanism bridges population-scale molecular atlases with patient-level clinical data.

Research

Five contributions. One integrated system.

No existing system simultaneously addresses all five barriers. SOMA's novelty is the specific integrated architecture that makes PSNs clinically deployable for the first time.

🔄

The Integrated System

Federated privacy + Blind Handshake + temporal velocity + atlas integration + LLM edge harmonization + missing modality imputation + continual learning with human validation gates.

Temporal Velocity Encoding

Disease trajectories as first-class graph properties. Not just where a patient is in disease space, but where they're going and how fast. Order-of-magnitude separation between progression archetypes.

🤝

Blind Handshake Protocol

Four-stage federated onboarding with model extraction mitigations. Fixed public anchors, bounded queries, DP noise. Zero patient data leaves the new node. Novel — no prior art exists.

🛡️

Federated Privacy on the Unit Hypersphere

Two-tier aggregate protocol achieves clinically meaningful similarity (C̄₁₀ > 0.7) at privacy budgets (ε = 2) where per-embedding noise destroys all utility. Noise reduction >200× at N=200.

🧬

Molecular Atlas Integration

Cell-type perturbation signatures via Pearson correlation against Allen Institute BKP's 5,200-cluster taxonomy. Pluggable Reference Atlas Layer for current and future atlases.

Architecture

Three-layer hub-and-spoke topology.

Designed from five first principles: distribute the hardest problem, transmit representations not data, encode velocity not just position, bridge molecular atlases to patient data, learn continuously and validate rigorously.

SOMA Central Brain Knowledge Graph · SNF Fusion · Continual Learning Secure Transport · DP-noised embeddings Secure Transport · DP-noised embeddings Blind Handshake (synaptogenesis) 🏥 Hospital A Dendrite Node 🏥 Hospital B Dendrite Node 🏥 Hospital C Dendrite Node 🔬 Research Lab A Dendrite Node 🧬 Genomics Center Dendrite Node 🧠 Brain Atlas (BKP) Blind Handshake… Synapse (active) Synaptogenesis (forming) Hospital Research
🏥

Dendrite Nodes

Autonomous Clinical Edge Agents at hospitals perform LLM-powered data harmonization, embed patients into 128-d privacy-protected vector space. Zero PHI egress.

🧠

Soma (Central Brain)

Cloud-based knowledge graph with temporal velocity encoding. Cross-institutional SNF fusion, matrix completion for missing modalities, closed-loop continual learning.

🤝

Synapse (Blind Handshake)

Four-stage onboarding protocol for new nodes. Calibrates embedding spaces, validates data quality, and establishes trust — all without exposing patient data. Synaptogenesis.

🔐

Secure Transport

Encrypted mTLS-authenticated channels. Only DP-noised embeddings cross boundaries. Offline queue for network interruptions. Payload-size enforcement.

Validation

Empirical characterization on synthetic data.

All results are on synthetic data. Clinical validation on real neurological patient data is the next phase.

Secure Aggregation

5 hospitals, 830 patients, 3 subtypes. Hybrid protocol: C̄₁₀ = 0.75 at ε = 2, 0.92 at ε = 5. Per-embedding: C̄₁₀ = 0.33 (random) at all ε.

Temporal Velocity

60 patients, 5 timepoints, 3 archetypes. Velocity magnitudes: rapid = 0.915, slow = 0.237, stable = 0.071. Order-of-magnitude separation.

SNF Fusion

80 patients, 3 clusters, 3 partial modalities. No single modality resolves all clusters. After 10-iteration SNF, all three fully resolved.

Matrix Completion

100 patients, 31% complete, 30–40% per-modality missingness. Cosine similarity of imputed vs. ground-truth: ~0.955 across all modalities.

Atlas Integration

50 patients, 3 subtypes, simulated 5,200-cell atlas. KMeans ARI = 1.000. Perfect subtype recovery.

Generative Replay

3 hospitals, sequential training. Cohesion 0.993 with replay vs. 0.986 without. Catastrophic forgetting mitigated.

A Curadai Research Initiative

SOMA represents the research arm of Curadai's vision — applying federated, privacy-preserving AI architecture to precision medicine. The same architectural primitives that power SOMA inform everything we build.

Visit Curadai ↗ Get in Touch