The problem · FHIR + RAG
FHIR JSON is reference-heavy. Pre-resolved references and pre-chunked text fix that.
Nested references
A FHIR Practitioner resource references PractitionerRole, which references Organization, which references Location — you need 4 round trips to build context.
Fonteum: Fonteum pre-resolves all references. The response you get is a flat, fully-populated bundle ready for chunking.
Token bloat
Raw FHIR JSON carries coding system URIs, meta fields, and extension blocks your model doesn't need. A single Practitioner can run 300+ tokens.
Fonteum: The /api/v1/rag/chunks export returns pre-chunked text with stable chunk IDs — same clinical data, fewer tokens per chunk than raw FHIR JSON.
Missing citations
When an LLM cites a provider fact, you need a traceable source and date. Standard FHIR resources don't carry that.
Fonteum: Every Fonteum resource carries meta.source and a provenance tag block: source name, last-checked date, and display rule.
Token efficiency · FHIR JSON vs chunk text
The same provider data. Half the tokens.
Toggle between the raw FHIR JSON response and the compact text that the /api/v1/rag/chunks export returns. Same provenance, same clinical data, fewer tokens in your context window.
{
"resourceType": "Practitioner",
"id": "prac-1003894328",
"meta": {
"tag": [
{ "system": "fonteum:provenance", "code": "cms-nppes" },
{ "system": "fonteum:last-checked", "code": "2026-05-24" }
]
},
"identifier": [
{ "system": "http://hl7.org/fhir/sid/us-npi", "value": "1003894328" }
],
"name": [{ "family": "Nguyen", "given": ["Emily"], "prefix": ["MD"] }],
"address": [
{
"use": "work",
"line": ["400 Park Ave"],
"city": "New York",
"state": "NY",
"postalCode": "10022"
}
],
"qualification": [
{
"code": {
"coding": [
{
"system": "http://nucc.org/provider-taxonomy",
"code": "207RC0000X",
"display": "Cardiovascular Disease"
}
]
}
}
]
}LangChain · integration walkthrough
Load Fonteum into LangChain.
The @fonteum/mcp package ships a createFonteumLangChainTools helper that wires all five Fonteum tools into LangChain. Pass the result as the tools array to any LangChain agent. Every tool response carries metadata.source from the Fonteum provenance block.
import { tool } from "@langchain/core/tools";
import { createFonteumLangChainTools } from "@fonteum/mcp/integrations/langchain";
// All five tools: search, resolve-by-NPI, exclusion check, dataset info, source list
const fonteumTools = createFonteumLangChainTools(tool, {
apiKey: process.env.FONTEUM_API_KEY, // omit for free demo access
});For MCP and Python paths, see docs/integrations.
LlamaIndex · VectorStoreIndex
Index provider data with LlamaIndex.
Load pre-chunked provider data from /api/v1/rag/chunks directly into a LlamaIndex VectorStoreIndex. Each node carries a stable chunk ID and provenance metadata for downstream citation generation.
import requests
from llama_index.core import VectorStoreIndex
from llama_index.core.schema import TextNode
nodes, cursor = [], 0
while cursor is not None:
page = requests.get(
"https://fonteum.com/api/v1/rag/chunks",
params={"limit": 200, "cursor": cursor},
).json()
for c in page["chunks"]:
nodes.append(TextNode(
text=c["text"],
id_=c["chunk_id"],
metadata={"cite": c["cite"], "source_url": c["source_url"], **c["provenance"]},
))
cursor = page["next_cursor"]
index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine()
response = query_engine.query("cardiologists accepting Medicare in Manhattan")FONTEUM FOR RAG
Every chunk arrives with 14 fields of provenance.
Export · /api/v1/rag/chunks
Pre-chunked, citation-ready text for your vector store.
One GET returns a deterministic, paginated feed of dataset chunks — no language model in the pipeline, byte-stable across pulls so re-embedding never churns your index. Every chunk arrives with a stable chunk_id (use it as your vector primary key), a cite string for footnotes, the upstream source_url, and the full 14-field provenance contract. Walk the corpus with ?limit= and the returned next_cursor.
# Plain REST — page through the corpus
curl "https://fonteum.com/api/v1/rag/chunks?limit=50&cursor=0"
# → { "total": N, "next_cursor": 50, "chunks": [
# { "chunk_id": "source:nppes#overview",
# "text": "...", "cite": "...", "source_url": "https://...",
# "provenance": { "_source": "...", "_dataset_id": "...", ...14 fields } } ] }Load it straight into a LangChain vector store. Each chunk maps to a Document whose metadata carries the provenance block — so generated citations trace back to the federal source and snapshot date.
import requests
from langchain_core.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
docs, cursor = [], 0
while cursor is not None:
page = requests.get(
"https://fonteum.com/api/v1/rag/chunks",
params={"limit": 200, "cursor": cursor},
).json()
for c in page["chunks"]:
docs.append(Document(
page_content=c["text"],
id=c["chunk_id"],
metadata={
"cite": c["cite"],
"source_url": c["source_url"],
"dataset_id": c["dataset_id"],
**c["provenance"],
},
))
cursor = page["next_cursor"]
index = FAISS.from_documents(docs, OpenAIEmbeddings())Public endpoint, rate limited per source IP. Deterministic ids mean an incremental re-pull upserts cleanly — no duplicate vectors.
Cited answers · /api/v1/ask
Ask a structured question, get cited facts back.
A typed question (ownership, sanctions, payments) plus an identifier returns a fact list where every fact carries the full 14-field provenance contract. Deterministic, no language model — “0 records match” is an answer, never a silent empty. The contract is strict: 400 on an unknown question / scheme / malformed value, 502 when a source read fails (never masked as an empty answer), and a hard internal gate refuses any fact missing its provenance.
curl "https://fonteum.com/api/v1/ask?question=sanctions&scheme=npi&value=1003000118"
# → { "question": "sanctions",
# "answer_text": "0 OIG LEIE exclusion records match NPI 1003000118.",
# "facts": [ /* each with a 14-field provenance block */ ],
# "citations": [ { "source": "HHS OIG LEIE", "source_url": "..." } ] }Token counts · by resource type
Average token counts per resource.
| Resource | JSON tokens | Chunk tokens | Reduction |
|---|---|---|---|
| Practitioner | 312 | 148 | −53% |
| Organization | 298 | 131 | −56% |
| Location | 187 | 89 | −52% |
| PractitionerRole | 224 | 104 | −54% |
| HealthcareService | 341 | 162 | −52% |
Token counts measured with tiktoken cl100k_base on a representative sample of 500 records per resource type. Actual counts vary by record.
Latency benchmarks · under load
Sub-300 ms at p99.
| Percentile | FHIR endpoint |
|---|---|
| p50 | 38 ms |
| p95 | 142 ms |
| p99 | 290 ms |
| p99.9 | 480 ms |
Measured at the Vercel edge with 50 concurrent connections. Latency is gateway-to-response-complete. Source data is served from a warm CDN cache; cold-cache adds ~80 ms.