NGS Data Analysis Bottlenecks

One of the biggest bottlenecks in next generation sequencing (NGS) today is data analysis, which is surprising considering that we have massive compute available at our fingertips, the available infrastructure to process thousands of samples in parallel, and plenty of programming libraries that simplify complex statistical operations into something resembling plain English.

So why do so many of us get stuck analyzing NGS data for days, even weeks? When Basepair founder Amit Sinha, an instructor at Harvard Medical School, began work on the web platform, he looked back at his decade of computational biology experience – and his colleagues’ most persisting holdups – to spotlight any unnecessary congestion in the NGS data analysis process.

The problem, Amit discovered, wasn’t the lack of technical resources, but a disjointed approach that required multiple pieces of command line software, file conversion between programs, and wasted time waiting for various algorithms to finish. Further, the results were spread across dozens of files, and information of interest had to be forcefully dug out.

Faster alternatives required a bioinformatician to dedicate precious time working in a CLI to run automated scripts, getting IT to help out in parallelizing workflows if dozens or hundreds of samples were involved, and generally leaning on team members with overflowing schedules. This wasn’t ideal either.

Having experienced every bottleneck imaginable in the NGS data analysis process, Amit created Basepair as a one-stop-shop that takes raw .fastq files and automatically performs complex analytics operations based on your chosen pipeline. The result is a ready report and a series of interactive graphs that allow researchers and bioinformaticians to move on to downstream analysis without getting stuck juggling algorithms and waiting for schedules to clear up.

Since multiple analyses in Basepair run in parallel, and all compute is hosted on powerful virtual machines in the cloud, the days and weeks traditionally imparted on analyzing multiple samples is reduced to under an hour.

The bread and butter of the Basepair web platform is automated workflows, which have saved teams at Harvard Medical School, Stanford, Memorial Sloan Kettering Cancer Center, and other world-class institutions thousands of hours.

Automated Workflows

At Basepair, we take several steps to ensure you get fast, accurate results with as few clicks – and as little hands-on time – as possible. We have automated the most useful workflows as ready-to-use pipelines for your NGS research. Several workflows are included per sequencing method.

For your RNA-Seq, DNA-Seq, ChIP-Seq, or ATAC-Seq data we feature custom pipelines modeled after the latest research in bioinformatics; our own bioinformaticians have even tweaked algorithms in these pipelines to ensure sequence data is interpreted with the highest accuracy.

Let’s explore a few options available for RNA-Seq to get a sense of how automated workflows make life easier for both individuals and teams of any size.

RNA-Seq Workflows

RNA-Seq is used for detecting fusion events, genetic variations, novel transcripts, and a host of other valuable genomic information. But the most popular use of RNA-Seq is differential gene expression.

Differential gene expression involves determining the expression count for each sample, normalizing across all samples, and then running various tests to visualize and interpret the data.

Basepair provides two expression count methods: aligning reads with either STAR or Tophat, and then counting with FeatureCounts. Here’s a visual of both automated pipelines:

expression count (Tophat) workflow

expression count (STAR) workflow

In the Basepair web platform, simply choose the Upload Data tab in the header menu, upload your zipped .fasq file, choose your Data Type as RNA-Seq, and choose the appropriate Workflow, then press Save and Analyze after filling out minimal metadata to rapidly generate accurate expression counts. It takes less than a minute to add a sample.

Our simple, but powerful DESeq differential expression pipeline allows you to generate up/down lists, .gct/.cls files, a volcano plot, and a heatmap – all publication ready – in a single click. These figures are also customizable, so you can present the data exactly as you need it. Simply follow the steps in the paragraph above but select DESeq as your Workflow.

Basepair has over 30 workflows and counting. Our team is diving deep into the latest research – and always tweaking current workflows – to bring you more workflows every month. Try our full platform immediately free by signing up here, and explore our automated workflows with your own NGS data.