In acute myeloid leukemia patients with AML1-ETO fusion oncoprotein, ASXL2 is one of the most frequently mutated genes. However, little was known about ASXL2 in normal or malignant haematopoiesis. This inspired Dr. Omar Abdel-Wahab at Memorial Sloan-Kettering Cancer Center to undertake a project to understand the molecular mechanisms of ASXL2. Using next generation sequencing (NGS), Dr. Abdel-Wahab decided to study gene expression (RNA-Seq), histone modifications (ChIP-Seq) and chromatin accessibility (ATAC-Seq).
Generating NGS data is now the easy part. An in-house sequencing core or an external vendor can sequence the samples and usually ship the data back in a couple of weeks. However, making sense of the data is the hard part. In the past, the lab used the bioinformatics core or worked with a bioinformatician. For this project, they used the Basepair software, so that the researchers could analyze their data themselves.
We first analyzed a few old datasets and Basepair results were as good or better than previous manual analysis.“What we liked about Basepair is that they used the best available tools in the analysis pipeline, so the results are high quality and readily acceptable for publication”, shared Dr. Abdel-Whab. Using Basepair, the researchers analyzed over 100 samples including RNA-Seq, ChIP-Seq and ATAC-Seq, for mouse and human genomes. The raw data was half a terabyte and the analyses generated another half a terabyte of results over 2000 files. Managing the data alone without Basepair would have been a nightmare.
Basepair even let us deposit the data to GEO with a few clicks.The data from the project was published in the journal Nature Communications, ASXL2 is essential for haematopoiesis and acts as a haploinsufficient tumour suppressor in leukemia.