Skip to content
FonteumThe Graph
DataResearchCare CompareThe DifferAttestAPI
See the proof
  • Data
  • Research
  • Care Compare
  • The Differ
  • Attest
  • API
See the proof
Inpatient utilization · Reference

Per-CCN per-MS-DRG inpatient utilization, free + open. The dataset Definitive sells as Hospital Performance.

Fonteumingests the CMS Medicare Inpatient Hospitals by Provider and Service file on the daily HEAD-probe pattern, with full ingestion firing on each annual mid-June release. ~60M rows per data year. The 14-tuple provenance contract ships inline with every API response so consumers verify what they’re looking at without a second round-trip.

Try the API → Outpatient sister dataset → Data catalog →

1. What this dataset is

Per-CCN per-MS-DRG Medicare inpatient discharge + payment + length-of-stay aggregates.

CMS publishes the “Medicare Inpatient Hospitals by Provider and Service” file annually each mid-June. One row per (CCN, MS-DRG code, data year). Each row carries the count of Medicare fee-for-service discharges, the average covered/total/Medicare payment amounts, and the average length of stay. Coverage extends back to data year 2018 in the current schema (legacy 2013-2017 schema is different and not ingested in Phase 2).

For context: this is the dataset Definitive Healthcare sells as their flagship “Hospital Performance” module at $45,000-$95,000 per buyer per year (depending on facility count). Fonteum publishes it free, open, with full 14-tuple provenance + Dataset JSON-LD discoverability + free .edu/.gov researcher tier.

2. What this dataset is NOT

No PHI. No claims. No patient records.

CMS pre-aggregates this file to facility-level rows before public release. The dataset contains:

  • NO patient identifiers — no names, no addresses, no dates of birth.
  • NO claim-level rows — only annual rollups per (CCN, MS-DRG).
  • NO discharge dates — just the data year.
  • NO cells with discharge counts under 11 — CMS pre-suppresses these per its privacy policy. Our schema preserves the suppression by allowing NULL in the count fields.

We additionally drop the provider-name and provider-address columns CMS ships with the file: those facts already live in the CMS Provider of Services (POS) file (canonical name + address per CCN), and dual-storage would create drift. Joins back to POS happen at query time via the federated identity bridge.

3. Refresh schedule

Daily HEAD probe at 06:00 UTC. Full ingest on annual mid-June release.

The Inngest cron runs daily on the schedule 0 6 * * *. The HEAD probe is cheap and short-circuits via the UNIQUE(source_id, snapshot_date) constraint when nothing has changed. Full ingest only fires when CMS publishes a new data year — typically once a year mid-June.

4. How it joins to other sources

CCN is the bridge to POS, Care Compare, ownership chains.

Every utilization row carries a CMS Certification Number (CCN). The federated identity layer (/identity) joins the CMS Provider of Services (POS) file (canonical facility name + address + type), Care Compare quality ratings, NH ownership chains, and now utilization in a single query.

Format guard: every CCN is validated against the ^[A-Z0-9]{6}$ pattern at ingest time. Rows failing the check are dropped before they reach inpatient_utilization_summary.

5. The API

GET /api/v1/utilization/inpatient/[ccn]

Returns the top-10 MS-DRGs by discharge count for the given CCN, with the full 14-tuple provenance contract attached inline. Auth flows through the standard withApi handler — bearer token, rate limit, tier resolution. The free .edu/.gov researcher tier gets the same envelope as the paid tiers.

{
  "data": {
    "ccn": "010001",
    "data_year": 2022,
    "top_ms_drgs": [
      {
        "ms_drg_code": "470",
        "ms_drg_description": "MAJOR JOINT REPLACEMENT OR REATTACHMENT...",
        "total_discharges": 342,
        "avg_covered_charges": 65000,
        "avg_total_payments": 13800,
        "avg_medicare_payments": 12100,
        "avg_length_of_stay": 2.5,
        "data_year": 2022
      }
    ],
    "provenance": {
      "_source": "CMS Medicare Inpatient Hospitals by Provider and Service",
      "_dataset_id": "cms-inpatient-utilization",
      "_snapshot": "2022-12-31",
      "_methodology": "v2026.05.0",
      "_license": "US-Government-Works",
      "_coverage_period_start": "2018-01-01",
      "_coverage_period_end": "ongoing"
    }
  },
  "meta": { "request_id": "req_...", "api_version": "v1", "...": "..." }
}
6. License + redistribution

US-Government-Works. Anyone can redistribute.

CMS publishes this file as a federal-government work, public domain in the U.S. under 17 U.S.C. §105 and the Open Government Data Act. The SPDX identifier US-Government-Works is what Fonteum surfaces in the provenance contract’s _license field for every row derived from this dataset.

7. How to cite

APA-ish, with the upstream CMS source named.

Fonteum. (2026). CMS Medicare Inpatient
Hospitals by Provider and Service [data set]. https://fonteum.com/docs/utilization-inpatient.
Retrieved [date]. Original source: Centers for Medicare & Medicaid
Services. License: US-Government-Works.

Detailed researcher citation guidance lives at /cite; the researcher-api docs describe the citation TOS for the free tier.

8. Verify the snapshot

SHA-256 attestation + S3 cache mirror.

Every snapshot lands with a SHA-256 attestation written by writeAttestation (PR #135). When the source-cache mirror (PR #154) is provisioned, every snapshot also mirrors to S3 — verifiers can re-download the original CSV from the cache and recompute the hash to confirm byte-exact provenance. Use /verify to walk the chain for any snapshot.

Built on the authoritative federal record

The primary sources, named on every page.

These are the federal agencies whose public datasets Fonteum ingests and attributes — the issuing authorities, not customers or partners. Every figure on the site links back to one of them.

  • CMS
  • HHS-OIG
  • HRSA
  • FDA
  • NLM
  • NUCC
  • Census
  • BLS
  • BEA

See the full source registry, with license and refresh cadence for each →

Reproducible by design

Every figure traces to its federal source.

14-tuple provenance

Every rendered fact ties to a source URL, dataset ID, snapshot date, row key, and SHA-256 — the full chain-of-custody record.

Reproducible SQL

Each study ships the exact query behind its figures, run against the cited federal snapshot. Re-run it yourself.

Daily reconciliation

Published counts are reconciled against the upstream federal datasets on a daily cadence, with drift logged.

Named medical review

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

Read the full provenance and attestation methodology →

Two doors

Use the free API and open data

Query providers, facilities, sanctions, and quality scores — each field carrying its federal source. Self-serve, no call to start.

Explore the API →Browse the data catalog →

Talk to us

Managed pilots, enterprise terms, and audit-ready, signed attestation packages for compliance, risk, and research teams.

Talk to us →
Fonteum
Products
The DifferAttestAPIFHIR API
Data
Care CompareResearchData catalogSources
Company
Why FonteumAboutPressEditorial policyCorrections
Legal
Privacy policyTerms of serviceMedical disclaimer

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

© 2026 Fonteum LLC. All rights reserved.

The U.S. healthcare graph AI can cite — every fact carries its source.

Request access→

The substrate, by the numbers

44federal source familiesDistinct CMS, OIG, HRSA, FDA and peer datasets
35dataset pagesCitable, downloadable /data catalog pages
52reproducible studiesEach shipping the SQL behind its figures