Skip to content
FonteumThe Graph
DataResearchCare CompareThe DifferAttestAPI
See the proof
  • Data
  • Research
  • Care Compare
  • The Differ
  • Attest
  • API
See the proof

FOR · ACADEMIC RESEARCHERS

LEIE exclusions

Citation-ready. Analysis-ready.

Federal datasets joined on CCN and NPI. Methodology documented per version. CC BY 4.0.

Read the methodology →

Federal datasets

CMS, HHS-OIG, and HHS sub-agencies. All public-domain upstream.

Total rows

Across every ingested dataset. Cross-joinable on CCN and NPI.

IRB requirements

Federal administrative data is public-use. No patient records. No de-identification required.

What's cite-able

with upstream source URLs.

Every dataset row carries the federal source URL at the time of ingest. The table below lists the upstream origin for each dataset — the URL a peer reviewer can independently retrieve. Fonteum's compilation layer adds cross-source joins, field typing, and methodology versioning; it does not alter or clean the source values.

DatasetFederal sourceRowsGrainJoin keyLicense
CMS Provider of Services (POS) iQIESdata.cms.gov ↗68,211Per certified facilityCCNU.S. Government Works
CMS Care Compare — Home Healthdata.cms.gov ↗12,392Per CCN-keyed agencyCCNU.S. Government Works
CMS Care Compare — Hospicedata.cms.gov ↗6,943Per CCN-keyed facilityCCNU.S. Government Works
CMS Care Compare — Nursing Home Penaltiesdata.cms.gov ↗16,832Per enforcement actionCCN + Survey Event IDU.S. Government Works
CMS NH Health Deficienciesdata.cms.gov ↗Per citationCCN + Survey Event IDU.S. Government Works
OIG LEIE Exclusionsoig.hhs.gov ↗Per excluded individual / entityNPI (joined)U.S. Government Works
CMS PECOS PPEF (Medical Enrichment)data.cms.gov ↗VariesPer enrolled providerNPIU.S. Government Works
CMS QPP MIPS Individual Scoresqpp.cms.gov ↗477,137Per clinician, per performance yearNPIU.S. Government Works
HCRIS Hospital Cost Reportswww.cms.gov ↗Per facility, per cost report periodCCNU.S. Government Works
CMS Open Paymentsopenpaymentsdata.cms.gov ↗VariesPer payment recordNPIU.S. Government Works
Federally Qualified Health Centers (HHS annual utilization data)data.hrsa.gov ↗Per FQHC siteFQHC ID (NPI joinable)U.S. Government Works
NSA IDR + MRF Compliancewww.cms.gov ↗DerivedPer entityNPI / Tax IDU.S. Government Works (upstream); CC BY 4.0 (Fonteum scoring layer)
CMS NSA Surprise Billing IDR Filingswww.cms.gov ↗DerivedPer filing, per initiating partyNPI / Tax IDU.S. Government Works

Methods-section boilerplate

Drop-in paragraph for journal submission.

Replace the bracketed tokens with the specific dataset, federal source name, methodology version string (e.g. snf-owners/v1), and snapshot date. The methodology version is pinned at export time and retrievable from /methodology indefinitely.

Data for this analysis were obtained from Fonteum
(fonteum.com), a federally-sourced healthcare data
infrastructure layer. The dataset used [DATASET] is
derived from [FEDERAL SOURCE], methodology version
[VERSION], snapshot date [DATE]. The upstream federal
data are public-domain (U.S. Government Works); the
Fonteum compilation is available under CC BY 4.0.
Cross-source joins are performed on CMS Certification
Number (CCN) or National Provider Identifier (NPI)
as documented in the methodology version cited above.
No patient-level data are included. No IRB review
was required for this analysis.

The methodology version in your data export matches the version page at fonteum.com/methodology/[dataset]. That page is durable — the same URL is retrievable after publication so peer reviewers and journal editors can independently verify the methods.


Reproducibility

Stata, R, and Python codebooks on request.

Statistical codebooks are available upon pilot access request. Each codebook includes variable descriptions, dtype contracts, and worked join examples that replicate the cross-source joins documented in the methodology.

Stata

Value labels, variable descriptions, and import scripts for every ingested dataset.

R

Tidyverse-compatible tibble import, column typing, and join vignettes.

Python

pandas / polars import scripts with dtype contracts and CCN ↔ NPI join examples.

Why this works as a reproducibility reference

The methodology page is the audit artifact.

Every Fonteum dataset ships a public methodology page at /methodology/[dataset]. The page renders: source family and Tier classification, ingest cadence, field schema with per-field confidence levels, join logic, known limitations, and version history with change rationale.

A peer reviewer who questions your methods gets a URL, not a vendor statement. The methodology version in the URL matches the version in your data export — and it does not change after you publish.

Browse all methodology pages →

Pre-publication data dictionary

Every field documented before you commit to a methodology.

Pilot access includes the full pre-publication data dictionary: field names, types, null rates, known edge cases, and the CMS/HHS source column they map to. The dictionary is delivered as a machine-readable JSON alongside the CSV export so your analysis scripts can validate dtypes at import time without manual inspection.

Field-level null rates

Per field, per dataset snapshot. Null rate changes flagged across versions.

Source column mapping

Every Fonteum field traces to the originating federal column name and file.

Known edge cases

CMS suppression sentinels ("*", "DS"), partial-year cost reports, facility closures mid-year.

Version diff

Field additions, removals, and type changes are documented between methodology versions.

How to cite Fonteum

Four citation formats. One per data export.

Every data export from the Fonteum API includes a four-format citation block in the response envelope: APA, Chicago, plain text, and BibTeX. The citation pins the methodology version and snapshot date so the reference is reproducible regardless of when a reader retrieves it.

// APA
Fonteum LLC (2026). [Dataset name], methodology
version [VERSION], snapshot [DATE].
Fonteum. https://fonteum.com/methodology/[dataset]

// BibTeX
@dataset{fonteum_[dataset]_[year],
  author    = {{Fonteum LLC}},
  title     = {[Dataset name]},
  year      = {[year]},
  version   = {[VERSION]},
  publisher = {Fonteum},
  url       = {https://fonteum.com/methodology/[dataset]},
  note      = {Snapshot date: [DATE]. CC BY 4.0.}
}

The citation block in your API response is generated from the pinned methodology version — it does not change when a new methodology version is released. Your published citation remains valid.

ORCID integration

Link your ORCID iD. Get attribution on every dataset citation.

When you link your ORCID iD to your Fonteum researcher account, dataset JSON-LD blocks on your citations carry your ORCID as the schema.org/Person creator. Peer reviewers and automated systems can verify attribution via your canonical public ORCID profile.

Example ORCID profile link: 0000-0002-1825-0097 ↗ (ORCID iD format). OAuth /authenticate scope only — Fonteum never writes to your record.

Read the ORCID integration docs →

Data access

Request data access for your research.

Describe the study scope (datasets, analysis period, research question), and we send a scoped data access agreement within 2 business days. Academic research requests receive a reduced pilot rate. No procurement loop required for single-PI studies.

Request access →Read the methodology →

FONTEUM · PILOT

Run a 90-day pilot. Public data only. No PHI.

Request access→ Read the methodology

Built on the authoritative federal record

The primary sources, named on every page.

These are the federal agencies whose public datasets Fonteum ingests and attributes — the issuing authorities, not customers or partners. Every figure on the site links back to one of them.

  • CMS
  • HHS-OIG
  • HRSA
  • FDA
  • NLM
  • NUCC
  • Census
  • BLS
  • BEA

See the full source registry, with license and refresh cadence for each →

Reproducible by design

Every figure traces to its federal source.

14-tuple provenance

Every rendered fact ties to a source URL, dataset ID, snapshot date, row key, and SHA-256 — the full chain-of-custody record.

Reproducible SQL

Each study ships the exact query behind its figures, run against the cited federal snapshot. Re-run it yourself.

Daily reconciliation

Published counts are reconciled against the upstream federal datasets on a daily cadence, with drift logged.

Named medical review

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

Read the full provenance and attestation methodology →

Two doors

Use the free API and open data

Query providers, facilities, sanctions, and quality scores — each field carrying its federal source. Self-serve, no call to start.

Explore the API →Browse the data catalog →

Talk to us

Managed pilots, enterprise terms, and audit-ready, signed attestation packages for compliance, risk, and research teams.

Talk to us →
Fonteum
Products
The DifferAttestAPIFHIR API
Data
Care CompareResearchData catalogSources
Company
Why FonteumAboutPressEditorial policyCorrections
Legal
Privacy policyTerms of serviceMedical disclaimer

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

© 2026 Fonteum LLC. All rights reserved.

The U.S. healthcare graph AI can cite — every fact carries its source.

Request access→

The substrate, by the numbers

9.2Mgraph entitiesProviders, organizations, owners, and facilities
12.5Mlinked identifiersNPIs, CCNs, LEIs and more, resolved to entities
4.7Mgraph edgesSource-attested relationships between entities
44federal source familiesDistinct CMS, OIG, HRSA, FDA and peer datasets
33dataset pagesCitable, downloadable /data catalog pages
47reproducible studiesEach shipping the SQL behind its figures