HOW CAN WE HELP YOU? Call 1-800-TRY-CHOP
Arcus Resources
The resources listed below are provided for CHOP staff interested in, or currently working with Arcus.
Please note: Some items require registration with Arcus or use of CHOP email and password for access.
- Arcus Scientific Project Request (for Arcus Users)
- Arcus Service Desk (for Arcus Users)
- Arcus Forum
- Data Contribution Guide
- Data Dictionary Best Practices
- Data Collection/Process Template
- Best Practices for Creating Files
- Research Data Management 101 Presentation
- ETL Documentation
- File Naming Activity Worksheet
- File Naming Tip Sheet
- Arcus RDM Project Template
- Ontologies
- REDCap recommended practices
- NIH Research Data Management and Sharing Plans Policy guidance
CAPNET Codebook
View / Download Resource: Codebook_for_2021_CAPNET_Data_July_12__2022.xlsx
- Authors: Okunowo, Oluwatimilehin; Lindberg, Dan; Campbell, Kristine; Wood, Joanne
- Description: The CAPNET codebook describes the contents, structure, and layout of the CAPNET data collection. The codebook includes information on the following: variable names, variable labels, values, value labels, missing data and skip patterns. In addition the codebook notes contain instructions and comments contextualizing the information conveyed in the variable or values. The codebook also contains a detailed record of any changes made to the CAPNET data collection instruments and the timing of those changes.Investigators planning to use CAPNET data must use the codebook to assist in identifying and fully understanding the variables they want to utilize for a study. The CAPNET codebook will be updated twice a year.
- Publication Date: 2022-07-12
CAPNET Data Dictionary
View / Download Resource: CAPNETDatabase_DataDictionary_2021_12_14.csv
- Authors: Wood, Joanne; Lindberg, Dan; Campbell, Kristine; Silverman, Ligia; Kratchman, Devon; Vaughn, Porcia; Egbe, Teniola
- Description: The CAPNET Data Dictionary is a specifically formatted spreadsheet in csv (comma delimited) format containing the metadata used to construct the CAPNET data collection instruments and fields. Investigators interested in contributing data to CAPNET or in collecting data that is compatible with CAPNET data can use this dictionary to assist them in creating a data collection form. Investigators planning to use CAPNET data can use the codbook to assist in identifying and understanding the variables they want to utilize for a study. This form will be updated twice a year.
- Publication Date: 2022-12-14
CAPNET Definitions Document
View / Download Resource: CAPNET Definitions Document.pdf
- Authors: Wood, Joanne; Lindberg, Dan; Campbell, Kristine
- Description: Document that defines specific terms used in the CAPNET data collection instruments and database. Example: CAPNET Episode: A CAPNET episode is the period inclusive of all signs, symptoms, and medical encounters associated with the specific injury or illness for which the CAP consultation was initiated. The initial hospitalization, all follow-up medical testing (including FUSS, OI testing, or other imaging or radiology) and the initial period of active consultation with child protective services and law enforcement. This form will be updated twice a year.
- Publication Date: 2021-03-16
CAPNET Participating Sites
View / Download Resource: CAPNET_Participating_Sites.pdf
- Authors: Wood, Joanne
- Description: List of sites across the country who are participating in the CAPNET database
- Publication Date: 2021-08-30
CAPNET Governance Documents
View / Download Resource: Governance_Documents.pdf
- Authors: Lindberg, Dan; Wood, Joanne; Campbell, Kristine; Pierce, Mary; Scribano, Phil; Leventhal, John; Laskey, Antoinette; Runyan, Des
- Description: The CAPNET Governance documents describe the mission and structure of CAPNET as well as the policy and procedures for reviewing request to utilize CAPNET data for research. This form will be updated once a year.
- Publication Date: 2021-12-20
CAPNET Frequently Asked Questions
View / Download Resource: Frequently_Asked_Questions.pdf
- Authors: Wood, Joanne; Lindberg, Dan; Campbell, Kristine
- Description: The CAPNET Frequently Asked Questions (FAQ) document contains a list of questions and answers pertaining to entering data into the CAPNET data collection instruments. The FAQ document is primarily intended for use by data enterers but will also be of interest to investigators utilizing CAPNET data who are seeking additional understanding regarding a specific variable or value. This form will be updated twice a year.
- Publication Date: 2022-05-18
Resources for NIH Mandated Data Sharing
The new NIH data management and sharing policy becomes effective January 25, 2023.
ALL grant applications or renewals that generate scientific data must now include a detailed Data Management and Sharing Plan (DMSP). Arcus provides templates and guidance documents to help you prepare your plan.
View the Presentations
- December 7, 2022 - View the presentation
- January 12, 2023 - View the presentation
- January 12, 2023 - View the presentation slides
Download this information as a PDF.
Overview
The lifecycle of your data is documented in a data management plan. The plan offers information on data collection for storage, access, sharing and reproducibility of your results. After your project is finished and the results have been published, a solid data management plan will guarantee that your research findings are accessible and available, improving the value of your work and enabling potential re-use by other researchers.
What's new about the 2023 NIH Data Management and Sharing Policy?
Previously, the NIH only required grants with $500,000 per year or more in direct costs to provide a brief explanation of how and when data resulting from the grant would be shared.
The 2023 policy is entirely new. Beginning in 2023, ALL grant applications or renewals that generate Scientific Data must now include a robust and detailed plan for how you will manage and share data during the entire funded period. This includes information on data storage, access policies/procedures, preservation, metadata standards, distribution approaches, and more. You must provide this information in a data management and sharing plan (DMSP). The DMSP is like what other funders call a data management plan (DMP).
The DMSP will be assessed by NIH Program Staff (though peer reviewers will be able to comment on the proposed data management budget). The Institute, Center, or Office (ICO)-approved plan becomes a Term and Condition of the Notice of Award.
What do I need to submit as a part of my funding proposal?
Data Management and Sharing Plan (DMSP)
If you plan to generate scientific data, you must submit a Data Management and Sharing Plan to the funding NIH ICO as part of the Budget Justification section of your application for extramural awards.
Your plan should be two pages or fewer and must include:
- Data Type
- Related Tools, Software and/or Code
- Standards
- Data Preservation, Access, and Associated Timelines
- Access, Distribution, or Reuse Considerations
- Oversight of Data Management and Sharing.
See Supplemental Information to the NIH Policy for Data Management and Sharing: Elements of an NIH Data Management and Sharing Plan for a detailed description of these Elements. For additional resources, refer to How to Get Started Writing a DMP.
Download this information as a PDF.
Why should I share my data?
NIH promotes data sharing to accelerate biomedical research discovery, enable validation of research results, provide access to high-quality data, and promote data re-use for future studies.
Where can I get help creating my NIH data management and sharing plan?
Request a consultation for NIH data management and sharing policy-related questions, or email Arcus Library Science supervisor Ene Belleh for assistance.
A template with guidance and sample language is also available to help researchers write NIH-compliant plans.
I work with sensitive topics/populations - how do I protect my participants' privacy?
NIH strongly encourages researchers who work with sensitive topics and/or populations to address data sharing in the Informed Consent process. Researchers should also pay special attention to their de-identification process to ensure that all identifying information has been fully removed. Finally, researchers should consider depositing their data in restricted access repositories that require data use agreements and research plans to access the data.
See Q&A from Presentation 12/7/22 for more details.
NIH 2023 Data Sharing Policy
What is considered "Scientific data" for the purposes of this plan?
The final NIH Policy defines Scientific Data as: "The recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications. Scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens." Even those scientific data not used to support a publication are considered scientific data and within the final DMS Policy's scope.
Can I make the data available upon request?
No. NIH prefers that scientific data be shared and preserved through repositories (such as Arcus) rather than kept by a researcher and provided upon request.
How will plans be assessed?
NIH program staff will assess the DMS plans but peer reviewers may comment on the proposed budget for data management and sharing.
What repository should I use?
Arcus is positioned as CHOP's central research data repository to help researchers/investigators fulfill these recommended elements. Here is a list of NIH external approved repositories
What is a standard? What standards are relevant to my research?
A standard specifies how exactly data and related materials should be stored, organized, and described. In the context of research data, the term typically refers to the use of specific and well-defined formats, schemas, vocabularies, and ontologies in the description and organization of data. However, for researchers within a community where more formal standards have not been well established, it can also be interpreted more broadly to refer to the adoption of the same (or similar) data management-related activities, conventions, or strategies by different researchers and across different projects.
When do I need to make my data available?
NIH encourages scientific data be shared as soon as possible, and no later than time of an associated publication or end of the performance period, whichever comes first.
What data management and sharing costs can I include in my grant?
Allowable costs can include:
- data curation and developing documentation (formatting data, de-identifying data, preparing metadata, curating data for a data repository)
- data management considerations (unique and specialized information infrastructure necessary to provide local management and preservation before depositing in a repository)
- preserving data in data repositories (data deposit fees)
For additional information, see NIH supplemental information on allowable costs
What happens if I do not comply with the NIH policy or make my data available as described in the DMSP?
The NIH has said that NIH Program Staff will be monitoring compliance with the policy during the funding period. "Noncompliance with Plans may result in the NIH ICO adding special Terms and Conditions of Award or terminating the award. If award recipients are not compliant with Plans at the end of the award, noncompliance may be factored into future funding decisions."
See Q&A from Presentation 12/7/22 for more details.
Download this information as a PDF.
The DMPTool is an online system that helps you create data management plans in accordance with NIH guidelines.
Optionally use the DMPTool (create an account and log in).
Getting Access
To log in to the tool, go to DMPTool.org and log in by (1) clicking Sign In, and (2) selecting the institutional log in option as shown below. You can then log in with your NetID.
Generic Sample DMSP Template provided by the NIH
This DMSP template is provided by Arcus for the benefit of the research community. This sample of a vetted DMPs from a successful proposal is provided by NIH. Please do not copy text from these DMPs verbatim into your own DMP.
Sample DMSP Template for Arcus Data Use
View an example of a DMSP template for use with Arcus data.
Sample LCE Grant Plan
View an sample Sample LCE Grant Plan.
Grant Language and Citations
For language about adding Arcus information to grant applications, please view / download Arcus Grant Language.
Arcus does not prescribe a single style or format for citations. Rather, any individual publisher guidelines should be followed so long as the required Arcus citation elements are present. Required elements include title, date accessed or dates meaningful to describing the resource, acknowledgement of Arcus, and the name(s) of the people or teams who created or prepared the data. See below for example use cases we’ve encountered so far. As Arcus grows, we will continue to provide guidance on citing various Arcus products and the list continues to expand.
Find the Correct Citation Format
Take this simple quiz to determine which citation format is right for your requirements. Or refer to the Use Cases listed below.
For language about adding Arcus information to grant applications, please view / download Arcus Grant Language.
Citation Use Cases
- Ome, Gene. New Methods for Genomic Data Analysis. Version 1.2. Arcus at Children's Hospital of Philadelphia. Accessed on 2021/10/31.
- Acknowledgment Statement:The New Methods for Genomic Data Analysis, Version 1.2, data were developed by Dr. Gene Ome and made available for reuse by Arcus at Children’s Hospital of Philadelphia. Accessed on 2021/10/31.
- Note: Contact person and title can be pulled from metadata for the corresponding hover-over in Arcus Cohort Discovery. Arcus data contribution metadata can be found in either the Research Dataset or Reference Cohort filters in the left-hand sidebar.
- Camacho, P. 2021. CHOP Center for Rehabilitation Research registry. Version 1.0. Arcus at Children's Hospital of Philadelphia.
- Acknowledgment Statement:The CHOP Center for Rehabilitation Research registry, Version 1.0 data were developed by Peter Camacho and made available for reuse by Arcus at Children's Hospital of Philadelphia. Accessed on 2021/09/30.
Citation for entirety of deidentified Arcus Data Repository accessed through an Arcus lab
- Arcus Data Repository Team. Deidentified Arcus Data Repository. Extracted: 2021/07/09. Version 1.4.4. Arcus at Children's Hospital of Philadelphia.
Citation for cohort scoped subset of deidentified Arcus Data Repository data accessed through an Arcus lab
- Arcus Data Repository Team. Cohort Scoped Deidentified Arcus Data Repository Subset. Extracted: 2021/07/09. Version 1.4.4. Arcus at Children's Hospital of Philadelphia.
Citation for cohort scoped subsetof identified Arcus Data Repository data accessed through an Arcus lab
- Arcus Data Repository Team. Cohort Scoped Identified Arcus Data Repository Subset. Extracted: 2021/07/09. Version 1.4.4. Arcus at Children's Hospital of Philadelphia.
- For citations of bothidentified and deidentifed ADR datasets, the ADR team noted the importance of including both the version and extraction date for ADR data. The version refers to the schema used to extract the data. The extracted date indicates the state of the underlying data in the lab.
- Right now finding the version and extraction dates is a manual process. Eventually, it may be included in the lab summary file. Someone from the ADR Team will assist users in getting the proper information about version and extraction date from the lab.
- Users can access the extraction date for data in their lab with the following: There is an extracted_date in each table in the lab dataset that indicates when it was extracted from the ADR (Arcus Data Repository). Note that not every table may have the same extracted date. It is also possible that multiple minor versions of the ADR were used to extract data for a lab.
- Precedent right now when there are multiple extraction dates is to use the latest date.
- For citations of bothidentified and deidentifed ADR datasets, the ADR team noted the importance of including both the version and extraction date for ADR data. The version refers to the schema used to extract the data. The extracted date indicates the state of the underlying data in the lab.
Arcus Cohort Discovery. 2021. Arcus at Children's Hospital of Philadelphia. https://arcus.chop.edu/cohort-discovery. Accessed on YYYY/MM/DD.
- Version is not included for the ACD because updates are released on an ongoing basis rather than batched together so versions are not tracked.
- Where theme and format allows, authors should incorporate the Arcus Acknowledgment Statement applicable to their work. Sample Acknowledgment Statements are found elsewhere in this document.
- If theme or format does not allow, the author should try to minimally mention "Arcus at the Children's Hospital of Philadelphia."
- If this phrase is included, the author should expound on the relevant aspects of Arcus to the extent possible.
- In the resultant publication, poster, session, etc., the author should formally cite Arcus in the References section.
"Study data were collected and managed using Arcus resources hosted at the Children's Hospital of Philadelphia. Arcus is a suite of tools and services developed to enhance research efforts at The Children's Hospital of Philadelphia by helping researchers to explore available data, see overlaps among datasets, build new cohorts, and determine if there are data or samples available for additional research projects. Incubated within the Department of Biomedical and Health informatics at CHOP, Arcus connects CHOP's clinical and research data to enable biomedical researchers to conduct highly innovative, data-driven, reproducible research within a managed scalable framework. This framework includes 1) user access controls; 2) patient privacy and confidentiality protections through regulatory review; 3) electronic honest-brokered data de-identification and re-identification; and 4) data retention, management, sharing, and destruction services in an auditable computational environment."