Verified Medical NLP – RDF-Grounded Jamba RAG
RDF-grounded medical question answering with deterministic hallucination checks and Jamba MoE reasoning
🩺 Overview
This project delivers a hallucination-mitigated medical language model pipeline that combines RDF knowledge graphs, retrieval-augmented generation (RAG), and the Jamba-1.5 Mixture-of-Experts architecture. The workflow grounds every response in verifiable biomedical evidence, producing fact-checked answers for clinical question answering.
I led design and implementation of the evaluation and verification subsystem, translating RDF query outputs into structured, human-readable summaries and evidence objects that drive fully auditable AI decisions.
🚀 Key Features
| Component | Description |
|---|---|
| RDF-Grounded RAG | Retrieves factual triples (disease–treatment–drug, etc.) from Bio2RDF, UMLS, and Wikidata Medical before generation. |
| Jamba Integration | Leverages Jamba-1.5 MoE for efficient long-context reasoning that blends retrieved biomedical evidence with prompt context. |
| Hallucination Evaluation Module | Implements deterministic checkers for factual consistency, retrieval coverage, and hallucination rate (H). |
| RDF Result Summarizer | Converts rdflib.query.Result objects into natural summaries or structured DocSource payloads. |
| Zero-Hallucination Pipeline | Benchmarks RDF-grounded vs. vanilla LLM outputs on PubMedQA, MedQA (USMLE), and Med-HALT datasets. |
🧩 Core Components I Built
1. result_to_summary()
- Handles all SPARQL query forms (SELECT, ASK, CONSTRUCT, DESCRIBE).
- Produces concise, human-readable narratives summarizing RDF answers.
- Automatically infers medical entities (disease, drug, gene, symptom) from triples.
- Fully deterministic — no downstream ML/LLM inference required.
Sample output
🩺 Found 3 result(s).
Variables: disease, treatment
🔍 Showing first 3 result(s):
1. disease: Diabetes, treatment: Insulin
2. disease: Hypertension, treatment: Atenolol
3. disease: Influenza, treatment: Oseltamivir
🧠 Extracted Medical Entities:
- Disease: Diabetes, Hypertension, Influenza
- Drug: Atenolol, Insulin, Oseltamivir
2. result_to_sources()
- Transforms RDF query results into
DocSourceobjects for provenance tracking. - Supports SELECT, ASK, CONSTRUCT, and DESCRIBE outputs with schema-specific adapters.
Example
Input rows:
| uri | label | abstract |
|---|---|---|
| http://bio2rdf.org/drugbank:DB001 | Aspirin | Used to treat pain and fever. |
Output:
[
DocSource(
id="drugbank:DB001",
title="Aspirin",
content="Used to treat pain and fever.",
source_type="SPARQL_SELECT"
)
]
📊 Benchmarking
Datasets
- PubMedQA (labeled biomedical QA)
- MedQA (USMLE clinical reasoning)
- Med-HALT (hallucination stress test)
Metrics
| Metric | Purpose |
|---|---|
| Hallucination Rate (H) | Share of unsupported or contradictory statements |
| Factual Precision / Recall | Alignment of claims with RDF ground truth |
| ROUGE-L / F1 | Overlap with reference clinical answers |
| Retrieval Precision@k | Accuracy of top-k triple retrieval |
| Latency | Response time per query |
Results
| Model | RDF Grounding | Hallucination ↓ | F1 ↑ | Latency (s) ↓ |
|---|---|---|---|---|
| LLaMA-3-8B-Instruct | ✗ | 0.45 | 0.61 | 1.2 |
| RAG-Text | ✗ | 0.36 | 0.68 | 2.3 |
| RDF-RAG | ✓ | 0.28 | 0.75 | 2.9 |
| Jamba + RDF | ✓ | 0.22 | 0.79 | 3.1 |
→ 36–50% reduction in hallucination rate while maintaining high factual accuracy.
🧰 Stack
- Languages: Python, SPARQL, RDFLib
- LLMs & Frameworks: Jamba-1.5, LLaMA-3-8B, FAISS-powered RAG
- Libraries:
rdflib,datasets,jsonlines,pandas,matplotlib - Datasets: PubMedQA, MedQA, Med-HALT
- Infrastructure: UF HiPerGator HPC (CUDA 12.1, Apptainer containers)