Technical Information

Sample QC | Workflow Information | Data Management | Equipment

Sample QC

All samples submitted will undergo Quality Control (QC) testing. There are QC metrics which are used to determine whether a sample is qualified or unqualifed to go through library construciton and/or sequencing. These metrics can vary depending on the workflow the samples are destined for. Each project going through the standard workflows will receive a sample QC report once the QC testing is finished (usually within 1 week of sample submission). All qualifed samples will automatically continue to library construction. If there are unqualifed samples, they will not continue through the process without customer/reasearcher approval. The Core does not guarantee success of unqualified samples, and the samples may be subject to some charges if they fail subsequent steps.

WGS and most WES (DNA samples)

Instruments/methods used:

QC metrics:

WES low input
Qualified: total mass ≥ 250ng

RNAseq (RNA samples)

Instruments/methods used:

QC metrics:

TruSeq workflows

NuGen low-input workflows

TruSeq small RNA workflows

Premade library seq (premade libraries)

Instruments/methods used:
Caliper LabChip GX (concentration and fragement size/range)

QC metrics:

Please see these example reports for further details. Please contact us at HTS@email.chop.edu if there are any questions.

Examples:
DNA QC Report Report

Premade Library QC Report

RNA Sample QC Report


Workflow information

HiSeq Sequencing

All sequencing is performed on the Illumina HiSeq 4000 or 2500 platform. The HiSeq 4000 has one sequencing mode with a running time of 1 to 3.5 days. The HiSeq 2500s (used primarily for pre-made library sequencing) have the ability to sequence in High Output mode(10-12 days runing time) or Rapid mode (1-2 days running time). All libraries sequenced in this core must be Illumina HiSeq compatible. Also, Paired End sequencing of reads of 100nt length or 150nt length (PE100/PE150 or 100 x 100, 150 x 150) is the standard offering for sequencing. Single end, 50nt (SE50) reads are an option mostly with premade libraries and should be inquired about in advance, if needed.

WGS

The Core can perform Whole Genome reSequencing for human, mouse and rat samples on the HiSeq 4000 (or we can send high volume of samples to a BGI facility in Hong Kong/Shenzhen for sequencing on a HiSeq X Ten system). Please inquire at HTS@email.chop.edu for other species. The workflow in our Core on the HiSeq 4000 is a BGI method based on modifications of the Illumina TruSeq DNA Sample Preparation Kit protocol. Libraries are constructed using fragmentation, end repair steps, ligation of index specific adaptors, as well as amplification of libraries. The insert length is typically 300-600bp. Analysis of library size, amount, length, quality and concentration are performed on a LabChip GX and/or by qPCR. Libraries are then loaded for sequencing over 1.5 PE150 lanes (30X coverage).

WES

This core offers a CAP certified workflow for Whole Exome Sequencing (WES) using an Agilent SureSelect Exome kit and protocol. Libraries are constructed through several steps including fragmentation, end repair and A-tailing, ligation of index specific adaptors, amplifications, washes and purifications, and hybridization of capture probes. The resultant library is typically 300-400bp. Analysis of library size, amount, length, quality and concentration are performed on a LabChip GX and/or by qPCR. Indexed libraries are then pooled and loaded onto lanes for sequencing to acheive appropriate coverage (ex. typcially four to eight per lane for 100X coverage depending on PE100 or PE150 run).

RNAseq

There are several RNA seq workflows to chose from depending on the research objectives. We currently offer three workflows using the Illumina TruSeq options and one using the NuGen option. TruSeq RNA sample prep: From total RNA, mRNA is polyA selected and fragmented to prepare the sample for library construction. Library construciton then progresses with cDNA synthesis, A-tailing and index adapter ligation, denaturation and amplification on cBot for sequencing. Necessary washes/purification are also included. The resultant library lenth is 300-500bp. Library analysis and quality checks are performed on a LabChip GX and/or by qPCR. Libraries are pooled as needed before cBot loading. Samples are loaded onto HiSeq lanes for sequencing to acheive appropriate reads (ex. about four per high output lane for about 40 million reads each).

TruSeq stranded RNA sample prep: Total RNA samples are treated with RiboZero to remove rRNA from the sample before moving into the main construction steps. The rest of the steps include cDNA synthesis, A-tailing and index adapter ligation, denaturation and amplification on cBot for sequencing. Necessary washes/purification are also included. The resultant library lenth is 300-500bp. Library analysis and quality checks are performed on a LabChip GX and/or by qPCR. Libraries are pooled as needed before cBot loading. Samples are loaded onto HiSeq lanes for sequencing to acheive appropriate reads (ex. about four per high output lane for about 40 million reads each). This workflow is able to return informaiton on mRNA as well as other RNA species such as long non-coding RNA. It is also capable of allowing researchers to indetify from which DNA strand information originates.

TruSeq small RNA sample prep: RNA samples are processed using an extraction kit capable of isolating total RNA that includes small RNA. The extract from this kit then enters library construction which includes ligation of adapters to small RNA, RT-PCR, and Pippin Prep size selection. Library analysis and quality checks are performed on a LabChip GX and/or by qPCR. Libraries are pooled as needed before cBot loading. Samples are loaded onto HiSeq lanes for sequencing to acheive appropriate reads (ex. about four per high output lane for about 40 million reads each).

NuGen Ovation RNA-Seq System V2: This workflow begins with a starting input of 500pg-100ng (low input) of total RNA samples for cDNA synthesis. NuGen's SPIA amplification is next which includes 3' and random primers to generate a library that includes polyA and non-adenylated transcripts. The library construction then continues similar to the others with index adapter ligation, washes/purifications, and cBot steps. The resultant library lenth is 300-500bp. Library analysis and quality checks are performed on a LabChip GX and/or by qPCR. Libraries are pooled as needed before cBot loading. Samples are loaded onto HiSeq lanes for sequencing to acheive appropriate reads (ex. about four per high output lane for about 40 million reads each).

Premade library sequencing

Premade libraries are also accepted in our Core. Libary QC will be performed in order to detect obvious indications of low quality libraries such as multiple peaks present or a large difference (≤10%) detected in the library size versus what's reported by customers. This should not be considered an all inclusive quality check. The samples will be pooled, if necessary (with another round of QC), and then will enter our sequencing queue. Sequencing type and run is completed as agreed upon in advance and also using the same standards as the other workflows. The core does not guanrantee data output or data coverage for any premade libraries because the core was not involved in constructing the libraies and cannot vouche for the process or quality.

Pilot study and Non-standard projects

If you are interested in a workflow or reagents that do not appear to be part of our standard offereings please contact us at HTS@email.chop.edu. We may be willing to complete a pilot study for your particular interest. We may also be willing to accept your project as a non-standard project. Please be advised that there are different policies, guarantees, pricing, etc. associated with pilot projects, which would be discussed if a pilot/non-standard project is initiated.


Data Management

Data management is an important part of our sequencing services. Below are several categories to consider with regards to sequencing data.

Data Processing

Once data has come off the sequencers there are data processing steps that must take place before it is ready for delivery.

Data QC: Bioinformatics team checks all data to be sure it meets our core quality standards (base quality scores, run parameters, etc.) and that it meets the requested project data requirements (coverage/sequencing depth or amount of output). If a sample does not meet these standards it will go back in the sequencing queue for additional sequencing.

Data Conversion: Data files from the sequencer must be converted from BCL files to FASTQ files which is our standard delivery format.

Data Output

Data can be delivered in a clean data format or raw data format. Please see Bioinformatic Chart for general ideal of the output available.

Clean data: Indicates the data has been filtered to remove low quality reads and adapter contaminationated reads. Most people desire clean data from WES and RNAseq projects, and that is our standard delivery format for these projects.

Raw data: Indicates no filtering has taken place and data is delivered with no further processng after BCL to FASTQ conversion. We will deliver WGS data and Premade library data as raw data only. For other workflows you have the option of requesting to receive raw data, if desired.

File format
FASTQ files: This file format is the standard dielivery otion for sample sequence data. Every project will result in data in FASTQ file format.

BAM files: This file format results from an alignement of the sample's sequence to its respective genome. This is not a standard delivery option, because this file is not always generated automatically. This format must be requested in advance, If desired, and an additional fee will be charged.

Data size
The amount of data you can expect from each sample or project depends on the workflow type and the sequencing coverage/depth each sample receives. When setting up a project, you may inquiry about these details for your specific project.

Standard Analysis Reports
Our bioinformatic team has the ability, upon request, to generate standard analysis reports for our WES and most RNA Seq workflows using BGI's proprietary and well proven pipelines. This will be a web-based report, which contains a variety of statistics related to your samples. Please see the Bioinformatic Chart for a general idea of available statistics, and you may also inquire within for report examples.

Data transfer

Our core will transfer data to you when all data is ready or in batches of data especially with large projects. We are capable of transferring data several ways.

Hard drive: You may provide your own hard drive, or we can provide one for you for an additional fee. If you will provide the hard drive we will contact you when your data is ready for you to drop off your hard drive. This transfer can take from 1-3 days depending on the transfer queue and/or the amount of data you have. We will contact you for pickup when the transfer is complete. If you purchase a hard drive from us we will load your data onto the hard drive and contact you when it's ready for pickup.

Internal (BGI) FTP site: If your total data amount is ≤40GB, then our FTP site is available to you. When your data is ready we will upload it to our FTP site. We will then contact you to provide the URL and login credentials for you to access your data. You will have two weeks to download your data from the site before it is deleted. This may take 1-2 days for us to upload data, and 1-3 days to download depending on the amount of data and your computer's processing speed.

CHOP Aspera transfer: This method of transfer is available to users that have CHOP AD accounts. We will go through CHOP Research IS to have space allocated to you after sample QC. We will notify you when your data is ready and has been uploaded. You will use your CHOP AD login credentials to access your data. The transfer should take no more than 1 day to complete.

CHOP transfer to DBHi: If you would like DBHii to assist you in moving data or analysis of your data we have the ability to use their dropbox connection for data transfer. We will notify you and DBHi when your data is ready for transfer, and after it's complete. Uploading data this way could take 1-4 days depending on the size of your data and the connection speed. You will need to have a separate discussion with DBHi if you would like them to perform any analysis.

External FTP sites, etc: If your institution has an alternative method, (i.e FTP sites, clouds, dropbox, etc) we can try to transfer your data this way if you'd like. However, because we operate in a secure environment we may experience connection problems. We will want to test the connection ahead of time, most likely after sample QC. We also suggest having another transfer method in mind as backup.

Data Storage

Upon completion of your project and data transfer, we will store your data for 6 months. After 6 months your data will be deleted.


Equipment

Covaris E220
Agilent 2100 BioAnalyzer system
Caliper LabChip
BioTek Synergy HT
Agilent Bravo Automated Liquid Handling Platform
Life Technologies Step-One Plus
Illumina cBot
Illumina HiSeq 4000 Sequencing System
Illumina HiSeq 2500 Sequencing System