I was riding the train on Sunday afternoon and opened X. Something caught my eye.
On Friday, February 13, the HHS DOGE team announced via X the open-source release of what it called the largest Medicaid dataset in department history. The tweet included political framing, but what interests me is the data itself, which is available for download at opendata.hhs.gov.
Quoting HHS.gov:
This dataset contains provider-level Medicaid spending data aggregated from outpatient and professional claims with valid HCPCS codes, covering January 2018 through December 2024. It provides insights into how Medicaid dollars are distributed across providers and procedures nationwide.
That’s 227 million rows. 3.4 GB compressed. Seven years of claims. Free.
Why Medicaid Data Matters โ And What It Doesn’t Show
To understand why this is interesting, it helps to know where Medicaid sits in the larger picture.
The United States spent approximately $4.9 trillion on healthcare in 2023, according to CMS. That breaks down roughly as follows: private insurance accounts for about 30% of total spending, Medicare about 21%, Medicaid about 18%, and out-of-pocket costs about 10%. The rest is a mix of VA, CHIP, and other government programs. In dollar terms, Medicaid spending was approximately $872 billion in 2023, covering roughly 80 million Americans โ disproportionately children, low-income adults, pregnant women, and people with disabilities.
But Medicaid is only one payer. If a hospital bills $100 million in total revenue, the Medicaid slice might be 15-25% of that. Medicare, commercial insurance, and self-pay make up the rest. So while this dataset is enormous, it’s a partial view. The providers you see ranked highly here may rank differently in total revenue. The procedures that appear most costly under Medicaid may not be the most costly overall. And the absence of a provider from this data doesn’t mean they’re small โ it may mean their patient mix skews toward commercial or Medicare populations.
This is Medicaid data. It tells you a lot about Medicaid. It tells you something about overall healthcare spending patterns. It doesn’t tell you everything.
The Raw Data Is Powerful but Sparse
The data at opendata.hhs.gov is powerful but sparse. Each row contains only seven fields: a billing provider NPI, a servicing provider NPI, an HCPCS procedure code, a claim month, and three aggregate measures (unique beneficiaries, total claims, and total paid). There are no provider names, no addresses, no procedure descriptions โ just numbers and codes.
To make sense of it, we join the spending data with two public reference files.
The first is the NPPES National Provider Registry, maintained by CMS. Every healthcare provider that bills Medicare or Medicaid is assigned a unique 10-digit National Provider Identifier, or NPI. The NPPES registry maps each NPI to a provider name, practice address, taxonomy code, and other details. By joining on NPI, anonymous billing records resolve into named hospitals, clinics, and physicians in specific cities and states.
The second is the CMS HCPCS Level II code file. Procedures in the dataset are identified by HCPCS codes. Level I codes (CPT) are maintained by the AMA and cover most medical procedures โ their descriptions are proprietary. Level II codes are maintained by CMS, freely available, and use letter prefixes (T, H, G, etc.) for services like clinic visits, durable medical equipment, and substance use treatment. By joining the HCPCS file, we get plain-language descriptions for every Level II code. For CPT codes, we rely on commonly known short titles.
Once these three datasets are linked โ spending records, provider identities, and procedure descriptions โ the 227 million rows of anonymous claims become something you can actually explore: Which hospitals receive the most Medicaid dollars? What are the most expensive procedures? How does imaging utilization vary across a state? The raw data answers none of these questions. The joins answer all of them.
What It Looks Like: Connecticut
I work for a medical imaging provider in Connecticut, so that’s where I started. I loaded the data into AWS (S3, Glue, and Athena) and ran some queries.
Here are the ten providers that received the most Medicaid dollars in Connecticut between January 2018 and December 2024:
| Rank | Provider | City | Total Paid | Claims |
|---|---|---|---|---|
| 1 | Yale New Haven Hospital | New Haven | $1,080,566,063 | 23,557,851 |
| 2 | Community Health Center Inc | Middletown | $533,305,233 | 7,332,545 |
| 3 | New England Home Care Inc. | Rocky Hill | $400,057,183 | 6,765,929 |
| 4 | Hartford Hospital | Hartford | $316,278,903 | 7,407,519 |
| 5 | Cornell Scott Hill Health Corporation | New Haven | $294,234,309 | 3,878,634 |
| 6 | Bridgeport Hospital | Bridgeport | $287,340,554 | 7,011,413 |
| 7 | Saint Francis Hospital and Medical Center | Hartford | $266,391,325 | 7,411,868 |
| 8 | Connecticut Children’s Medical Center | Hartford | $248,942,412 | 2,468,547 |
| 9 | The Hospital of Central Connecticut | New Britain | $220,100,848 | 4,101,984 |
| 10 | State of Connecticut | Farmington | $205,921,864 | 6,309,714 |
A caveat: some providers and health systems bill under multiple NPI numbers. Yale New Haven Health, for example, operates Yale New Haven Hospital, Bridgeport Hospital, and several other facilities that each have their own NPI. Hartford HealthCare similarly operates Hartford Hospital, The Hospital of Central Connecticut, and others. A full picture of system-level Medicaid revenue would require rolling up multiple NPIs to their parent organizations โ a meaningful analysis, but one that goes beyond what a simple query can show.
Imaging in Connecticut
Narrowing the focus to imaging, I filtered the dataset to diagnostic imaging procedure codes to see what the Medicaid imaging landscape looks like in Connecticut.
The top ten imaging procedures by total Medicaid dollars paid:
| Rank | Code | Description | Total Paid | Claims |
|---|---|---|---|---|
| 1 | 74177 | CT abdomen & pelvis with contrast | $70,589,851 | 652,447 |
| 2 | 93306 | Echocardiogram, complete, with Doppler | $54,125,735 | 440,764 |
| 3 | 71046 | Chest X-ray, 2 views | $43,778,723 | 1,550,715 |
| 4 | 70450 | CT head/brain without contrast | $36,912,988 | 785,939 |
| 5 | 77067 | Screening mammography, bilateral | $30,851,726 | 450,589 |
| 6 | 78452 | Myocardial perfusion imaging, multiple studies | $24,481,755 | 65,256 |
| 7 | 70553 | MRI brain with and without contrast | $19,995,810 | 123,177 |
| 8 | 74176 | CT abdomen & pelvis without contrast | $19,903,868 | 209,276 |
| 9 | 71045 | Chest X-ray, single view | $18,509,702 | 1,345,127 |
| 10 | 76705 | Ultrasound, abdominal, limited | $15,828,097 | 239,404 |
No surprises here for anyone in the field. CT abdomen/pelvis with contrast is the workhorse of emergency and inpatient imaging. Chest X-rays generate enormous volume but modest per-exam revenue. And screening mammography, despite being a single procedure code, accounts for over $30 million in Medicaid payments across the state over seven years.
Again โ this is Medicaid only. The total imaging volume across all payers in Connecticut is substantially higher.
Is This Unprecedented?
Largely, yes.
CMS has published Medicare provider utilization and payment data since 2014, when a federal court ruling forced its release. That data โ the Medicare Physician & Other Supplier Public Use File โ provides similar provider-level, procedure-level detail, but only for Medicare Part B fee-for-service claims.
Medicaid has been a different story. The primary Medicaid claims repository, the Transformed Medicaid Statistical Information System (T-MSIS), contains beneficiary-level data but is not publicly available. Access requires CMS Privacy Board approval, a formal Data Use Agreement through the Research Data Assistance Center, and an approval process that can take up to a year. Even approved researchers face 18-24 month data lags and redaction of managed care payment details. CMS has published some aggregated Medicaid data on Data.Medicaid.gov โ enrollment figures, drug utilization, quality measures โ but nothing at this granularity.
This release is the Medicaid equivalent of what has existed for Medicare for over a decade, and it arguably goes further. It includes managed care claims, not just fee-for-service. It covers all states. And it’s free to download with no application or data use agreement required.
Who Cares?
Anyone in the competitive healthcare space should be paying attention.
Provider-level spending data at this granularity is the kind of thing that commercial data brokers โ IQVIA, Definitive Healthcare, Optum, Merative (formerly Truven/IBM Watson Health) โ sell for six- and seven-figure annual contracts. Those vendors offer multi-payer data with richer fields and polished analytics platforms, so this free dataset isn’t going to replace a Definitive Healthcare subscription. But for the Medicaid slice specifically, a lot of what organizations have been paying for is now downloadable in a single CSV.
Health systems use this kind of data for market analysis โ understanding referral patterns, identifying service line opportunities, benchmarking against competitors. Insurance companies use it for network adequacy and rate setting. Researchers use it to study access, utilization, and disparities. Consultants use it to advise on all of the above. The fact that you can now run these queries yourself, for free, against seven years of nationwide Medicaid claims is significant.
It’s not the whole picture. Medicaid represents roughly 18% of total healthcare spending. You’d still need Medicare and commercial claims data for a complete view. But 18% of a $4.9 trillion market is still $872 billion, and this dataset covers all of it at the provider and procedure level.
From Identifiers to Insight
There’s something satisfying about taking 227 million rows of opaque identifiers and turning them into tables that actually mean something โ connecting NPI numbers to hospital names, procedure codes to plain-English descriptions, and dollars to the institutions and services they flow to.
Whether this data leads to better policy, better research, or just better-informed conversations about where Medicaid dollars go, the fact that it’s now freely available is worth noting.
