Data Science and Biostatistics Unit



Subscribe to be notified of changes or updates to this page.

DBHi’s Data Science and Biostatistics Unit (DSBU) works with researchers to define the research question, select appropriate data, and develop methodologies for data collection and analysis. The DSBU team has significant expertise in managing and using various data sources, ranging from electronic health records, clinical trial, or registry data, to administrative, claims, or survey data. They work with researchers to prepare data visually in graphical and tabular forms for use in publications such as academic journals, research reports, and other formats.

The DSBU includes biostatisticians and data scientists who support and consult with CHOP investigators using complex data to address research questions:

With a wealth of experience handling datasets of various sizes and structures, DSBU staff utilize appropriate software to clean and analyze data including standardizing data sets from different sources to a common format, customizing and transforming data into research-ready databases, developing data dictionaries and data standards documentation, and applying statistics and qualitative methods to assessing data for utilization for modeling. During this process, staff can aid researchers to design and record appropriate data management plans. Various software tools are utilized to manage and analyze data, including SAS, R, STATA, Python, ArcGIS, MPlus and other software.

DSBU staff work with investigators to develop data and methods portions of grant proposals with the appropriate study design, write statistical analysis plans, and contribute to IRB protocols by providing power and sample size calculations. Staff are included in grants as Co-Investigators or named personnel, depending on specific staff roles for each grant.

With the diverse analytic backgrounds of DSBU staff, a wealth of different analytic methods can be applied to research data, including but not limited to propensity score matching, multivariate modeling, latent variable mixture models, machine learning, supervised and unsupervised data analysis and simulations. The DSBU has additional experience in geographic information systems and geospatial statistics. When appropriate, new analytic methods can be applied, especially for emerging data such as microbiome, genetics/genomics, and longitudinal analyses. DSBU staff also utilize various techniques to translate complex data structures and analyses into easily understandable visualizations.