Data Science and Biostatistics Resources



Subscribe to be notified of changes or updates to this page.

2 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

Frequently Utilized Databases

Healthcare Cost and Utilization Project (HCUP)

HCUP is a family of healthcare databases and related software tools and products developed through a Federal-State-Industry partnership and sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP databases bring together the data collection efforts of state data organizations, hospital associations, private data organizations, and the federal government to create a national information resource of encounter-level healthcare data.

HCUP includes the largest collection of longitudinal hospital care data in the United States, with all-payer, encounter-level information beginning in 1988. These databases enable research on a broad range of health policy issues, including cost and quality of health services, medical practice patterns, access to healthcare programs, and outcomes of treatments at the national, state, and local market levels.

IBM® MarketScan® Research Databases

IBM® MarketScan® Research Databases provide one of the longest-running and largest collections of proprietary de-identified claims data for privately and publicly insured people in the U.S. Currently, affiliates of the Center for Pediatric Clinical Effectiveness have access to Marketscan data, which include pediatric claims from 2013-2018 of over 8 million children.

Pediatric Health Information Systems Database (PHIS):

The Pediatric Health Information System®, a comparative pediatric database, includes clinical and resource utilization data for inpatient, ambulatory surgery, emergency department and observation unit patient encounters for more than 49 children's hospitals. PHIS supports a wide range of improvement activities including clinical effectiveness, resource utilization, care guideline development, readmission analysis, antimicrobial stewardship, physician profiling (OPPE) and more. Hospitals can identify areas to improve clinical care, enhance financial outcomes, improve clinical documentation, and perform research.


Premier is a large, U.S. hospital-based, service-level, all-payer database that contains information on inpatient discharges, primarily from geographically diverse nonprofit, non-governmental, community and teaching hospitals, and health systems from rural and urban areas. Inpatient admissions include over 121 million representing approximately 25 percent of annual U.S. inpatient admissions. Outpatient encounters include 800+ visits, which include visits to emergency departments, ambulatory surgery centers, and alternate sites of care are included. Patients can be tracked in the same hospital across the inpatient and hospital-based outpatient settings, with the ability to assess hospital length of stay and readmissions to the same hospital. Premier contains information on hospital and visit characteristics; admitting and attending physician specialties; healthcare payers; and patient data from standard hospital discharge billing files.