FOR · RAG DEVELOPERS

PBJ rows

FHIR R4, retrieval-optimized for RAG.

Pre-resolved references, flat JSON, citation-ready provenance on every resource.

The problem · FHIR + RAG

FHIR JSON is reference-heavy. Pre-resolved references and pre-chunked text fix that.

Nested references

A FHIR Practitioner resource references PractitionerRole, which references Organization, which references Location — you need 4 round trips to build context.

Fonteum: Fonteum pre-resolves all references. The response you get is a flat, fully-populated bundle ready for chunking.

Token bloat

Raw FHIR JSON carries coding system URIs, meta fields, and extension blocks your model doesn't need. A single Practitioner can run 300+ tokens.

Fonteum: The /api/v1/rag/chunks export returns pre-chunked text with stable chunk IDs — same clinical data, fewer tokens per chunk than raw FHIR JSON.

Missing citations

When an LLM cites a provider fact, you need a traceable source and date. Standard FHIR resources don't carry that.

Fonteum: Every Fonteum resource carries meta.source and a provenance tag block: source name, last-checked date, and display rule.

Token efficiency · FHIR JSON vs chunk text

The same provider data. Half the tokens.

Toggle between the raw FHIR JSON response and the compact text that the /api/v1/rag/chunks export returns. Same provenance, same clinical data, fewer tokens in your context window.

58% more tokens than chunk text

{
  "resourceType": "Practitioner",
  "id": "prac-1003894328",
  "meta": {
    "tag": [
      { "system": "fonteum:provenance", "code": "cms-nppes" },
      { "system": "fonteum:last-checked", "code": "2026-05-24" }
    ]
  },
  "identifier": [
    { "system": "http://hl7.org/fhir/sid/us-npi", "value": "1003894328" }
  ],
  "name": [{ "family": "Nguyen", "given": ["Emily"], "prefix": ["MD"] }],
  "address": [
    {
      "use": "work",
      "line": ["400 Park Ave"],
      "city": "New York",
      "state": "NY",
      "postalCode": "10022"
    }
  ],
  "qualification": [
    {
      "code": {
        "coding": [
          {
            "system": "http://nucc.org/provider-taxonomy",
            "code": "207RC0000X",
            "display": "Cardiovascular Disease"
          }
        ]
      }
    }
  ]
}

LangChain · integration walkthrough

Load Fonteum into LangChain.

The @fonteum/mcp package ships a createFonteumLangChainTools helper that wires all five Fonteum tools into LangChain. Pass the result as the tools array to any LangChain agent. Every tool response carries metadata.source from the Fonteum provenance block.

import { tool } from "@langchain/core/tools";
import { createFonteumLangChainTools } from "@fonteum/mcp/integrations/langchain";

// All five tools: search, resolve-by-NPI, exclusion check, dataset info, source list
const fonteumTools = createFonteumLangChainTools(tool, {
  apiKey: process.env.FONTEUM_API_KEY, // omit for free demo access
});

For MCP and Python paths, see docs/integrations.

LlamaIndex · VectorStoreIndex

Index provider data with LlamaIndex.

Load pre-chunked provider data from /api/v1/rag/chunks directly into a LlamaIndex VectorStoreIndex. Each node carries a stable chunk ID and provenance metadata for downstream citation generation.

import requests
from llama_index.core import VectorStoreIndex
from llama_index.core.schema import TextNode

nodes, cursor = [], 0
while cursor is not None:
    page = requests.get(
        "https://fonteum.com/api/v1/rag/chunks",
        params={"limit": 200, "cursor": cursor},
    ).json()
    for c in page["chunks"]:
        nodes.append(TextNode(
            text=c["text"],
            id_=c["chunk_id"],
            metadata={"cite": c["cite"], "source_url": c["source_url"], **c["provenance"]},
        ))
    cursor = page["next_cursor"]

index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine()
response = query_engine.query("cardiologists accepting Medicare in Manhattan")

FONTEUM FOR RAG

Every chunk arrives with 14 fields of provenance.

Read the API docs → View dataset coverage

Export · /api/v1/rag/chunks

Pre-chunked, citation-ready text for your vector store.

One GET returns a deterministic, paginated feed of dataset chunks — no language model in the pipeline, byte-stable across pulls so re-embedding never churns your index. Every chunk arrives with a stable chunk_id (use it as your vector primary key), a cite string for footnotes, the upstream source_url, and the full 14-field provenance contract. Walk the corpus with ?limit= and the returned next_cursor.

# Plain REST — page through the corpus
curl "https://fonteum.com/api/v1/rag/chunks?limit=50&cursor=0"

# → { "total": N, "next_cursor": 50, "chunks": [
#     { "chunk_id": "source:nppes#overview",
#       "text": "...", "cite": "...", "source_url": "https://...",
#       "provenance": { "_source": "...", "_dataset_id": "...", ...14 fields } } ] }

Load it straight into a LangChain vector store. Each chunk maps to a Document whose metadata carries the provenance block — so generated citations trace back to the federal source and snapshot date.

import requests
from langchain_core.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

docs, cursor = [], 0
while cursor is not None:
    page = requests.get(
        "https://fonteum.com/api/v1/rag/chunks",
        params={"limit": 200, "cursor": cursor},
    ).json()
    for c in page["chunks"]:
        docs.append(Document(
            page_content=c["text"],
            id=c["chunk_id"],
            metadata={
                "cite": c["cite"],
                "source_url": c["source_url"],
                "dataset_id": c["dataset_id"],
                **c["provenance"],
            },
        ))
    cursor = page["next_cursor"]

index = FAISS.from_documents(docs, OpenAIEmbeddings())

Public endpoint, rate limited per source IP. Deterministic ids mean an incremental re-pull upserts cleanly — no duplicate vectors.

See a live sample response →

Cited answers · /api/v1/ask

Ask a structured question, get cited facts back.

A typed question (ownership, sanctions, payments) plus an identifier returns a fact list where every fact carries the full 14-field provenance contract. Deterministic, no language model — “0 records match” is an answer, never a silent empty. The contract is strict: 400 on an unknown question / scheme / malformed value, 502 when a source read fails (never masked as an empty answer), and a hard internal gate refuses any fact missing its provenance.

curl "https://fonteum.com/api/v1/ask?question=sanctions&scheme=npi&value=1003000118"

# → { "question": "sanctions",
#     "answer_text": "0 OIG LEIE exclusion records match NPI 1003000118.",
#     "facts": [ /* each with a 14-field provenance block */ ],
#     "citations": [ { "source": "HHS OIG LEIE", "source_url": "..." } ] }

Token counts · by resource type

Average token counts per resource.

Resource	JSON tokens	Chunk tokens	Reduction
Practitioner	312	148	−53%
Organization	298	131	−56%
Location	187	89	−52%
PractitionerRole	224	104	−54%
HealthcareService	341	162	−52%

Token counts measured with tiktoken cl100k_base on a representative sample of 500 records per resource type. Actual counts vary by record.

Latency benchmarks · under load

Sub-300 ms at p99.

Percentile	FHIR endpoint
p50	38 ms
p95	142 ms
p99	290 ms
p99.9	480 ms

Measured at the Vercel edge with 50 concurrent connections. Latency is gateway-to-response-complete. Source data is served from a warm CDN cache; cold-cache adds ~80 ms.

Get API access →

FONTEUM · PILOT

Run a 90-day pilot. Public data only. No PHI.

Request access → Read the methodology