When Dr. Omar Abdel-Wahab at Memorial Sloan Kettering Cancer Center decided to undertake a project exploring the molecular mechanisms of ASXL2 – a frequently mutated gene in acute myeloid leukemia patients with the AML1-ETO fusion oncoprotein – he had three choices: work with the bioinformatics core, find a temp bioinformatician, or have his team find suitable software.

Dr. Omar Abdel-Wahab's lab at Memorial Sloan Kettering Cancer Center

Dr. Omar Abdel-Wahab’s lab at Memorial Sloan Kettering Cancer Center

Finding the right NGS data analysis solution

Generating NGS data was the easy part. An in-house sequencing core or an external vendor could sequence the samples and usually ship the data back in a couple of weeks. However, making sense of the data was the bigger challenge.

Once the NGS data was in the lab, Dr. Abdel-Wahab could a) work with the (already overbooked) bioinformatics core at MSKCC b) find an external bioinformatician or c) have his lab’s postdocs use NGS data analysis software – something they hadn’t done before.

Dr. Abdel-Wahab had already tried the first two approaches. They hadn’t been the smoothest experiences. Cores and bioinformaticians are almost always heavily backlogged, and with over 200 samples to analyze, Dr. Abdel-Wahab needed a much faster alternative. His team began searching online for an affordable, scalable NGS data analysis solution.

His team needed a service that would allow them to study gene expression (RNA-Seq), histone modifications (ChIP-Seq), and chromatin accessibility (ATAC-Seq). They also needed something that would scale to a large number of samples, and which was simple enough that they wouldn’t need to hire additional help.

Analyzing NGS data at scale

During their search for NGS data analysis software, the lab tried Basepair, which happened to fit their requirements.

With Basepair, Dr. Abdel-Wahab and his team were able to analyze the NGS data themselves, and get results within hours – not days or weeks. The simple, intuitive interface allowed the team to run over 200 samples at scale, for both mouse and human genomes. The team had never analyzed NGS data before, relying instead on external sources, but were still able to get up and running in minutes.

“We first analyzed a few old datasets and Basepair results were as good or better than previous manual analysis.”

Dr. Abdel-Wahab and his lab were able to work with high-quality pipelines for all three data types – RNA-Seq, ChIP-Seq, and ATAC-Seq.

“What we liked about Basepair was that they used the best available tools in the analysis pipeline, so the results were high quality and readily acceptable for publication.”

The raw data that the team fed into Basepair was half a terabyte, and the one-click analysis generated another half a terabyte of results, with a total of 2,000 files. Manually managing the data without Basepair would have been a nightmare.

“Basepair even let us deposit the data to the Gene Expression Omnibus with a few clicks.”

The data from the project was used to publish a paper in Nature Communications: ASXL2 is essential for haematopoiesis and acts as a haploinsufficient tumour suppressor in leukemia.

Try Basepair for your team

Basepair can analyze hundreds or even thousands of samples in parallel, with minimal setup. The researchers at MSKCC were able to use Basepair’s platform to run analysis on three NGS data types in hours, when alternative approaches would have taken days or weeks.

Their results were high-quality, included interactive visuals, and were publication-ready. Dr. Omar Abdel-Wahab and his lab were able to get their paper in Nature much faster than if they tried to manually analyze their data with open-source tools, or had to wait for the center’s bioinformatics core.

Want to try Basepair yourself? We have a free 14-day trial available for all new users – simply use the link below to sign up (no credit card details required).