We talk to bioinformaticians at research cores and pharma companies every day. One of the biggest changes we’ve seen in the last couple years is the growing popularity of single cell sequencing. And while it’s too early to call single cell RNA-seq ubiquitous, no doubt it’s headed that way. This means it’s important for bioinformatics teams to strategize how to handle more single cell projects, as demand for this cutting-edge approach will only continue to grow.
The winds of RNA-seq analysis are shifting
Five years ago standard bulk RNA-seq comprised a large chunk of many a bioinformatician’s workload. Today, we’re starting to see entire teams within bioinformatics cores devoted almost exclusively to single cell RNA-seq analysis. More researchers are working with this game-changing data type, sure, but that’s not the only reason for such a resource shift to single cell.
The reality is that single cell RNA-seq is just plain more time intensive and complicated than bulk RNA-seq ever was. Armed with a few automated pipelines and enough compute power, a competent bioinformatician can crunch out dozens and dozens of standard RNA-seq projects a year. Not so for single cell RNA-seq data. Working with this data requires much more nuance and a deeper understanding of the biology behind the dataset. Moreover, there is no definitive consensus within the scientific community on the best tools and analysis best practices for single cell. They’re still emerging and changing. This makes data analysis all the more challenging.
The art and the challenges of single cell analysis
When it comes down to it, single cell analysis is an art. An art that is both labor and time intensive. After talking to dozens of bench scientists and bioinformaticians, we’ve found two common trends in challenges they face when working on single cell projects.
- Each single cell project is unique, so a one-size-fits-all pipeline doesn’t work for this type of data. Analyses require delicate fine-tuning.
- Once an analysis is complete, getting the figures and clusters right requires a lot of back-and-forth between the bioinformatician and the researcher.
Let’s pause here for a minute and look at the big picture. These two common challenges come down to one important point: more than any other data type, single cell analysis requires bioinformaticians and bench scientists to work in tandem.
Bridging the gap between the bioinformatician and the bench scientist
Working in tandem is time-consuming. The problem with existing single cell analysis tools out there, even popular tools like Seurat, is that they don’t make collaboration between computational and non-computational researchers any easier. Can Seurat give you excellent cluster visualizations? Definitely, but only if you know how to input the right commands. For the bioinformatician, this means a thread of emails with the bench scientist who needs not 5 but 7 clusters in their UMAP figure. Oh wait, can you actually make that 8 clusters?
A thread of emails just to get a figure right? It shouldn’t have to be that inefficient, not to mention annoying, and entirely unscalable.
That’s why tools like Basepair, which enable easier collaboration between bioinformaticians and bench scientists, are so valuable to researchers working with emerging techniques like single cell RNA-seq. And this is also why in the last couple months, the bioinformatics and dev teams at Basepair have been hard at work improving our single cell RNA-seq pipelines and the user interface to make collaborating on single cell projects easier than ever.
A single cell pipeline designed to be fine-tuned. Not a black box.
“I’m worried that researchers who don’t understand the computational nuances of single cell data could end up treating any automated pipelines as a black box. This can lead to conclusions that are not robust.” This is a concern that has surfaced repeatedly among bioinformaticians we’ve talked to.
And we hear you. The great thing about Basepair is that you can run a single cell RNA-seq analysis with robust parameters controlled entirely by the bioinformatician. Once the analysis runs, the generated report can be reviewed by the bioinformatician, and only then shared with other collaborators or customers. This ensures collaboration as well as control over the data every step of the way.
A GUI for bioinformaticians, too
The ease with which pipeline parameters can be customized on Basepair should not be underestimated. But you’re a bioinformatician, you don’t need a fancy frontend — that’s for the non-computational folks, right? Actually, parameter customization is one of the most popular features among bioinformaticians who use the platform. It’s straightforward. It’s fast. It makes your job that much easier.
You have a single cell project and you need to include multi-mapped reads in the analysis? You need to set the number of cells in your sample? Or maybe you want the minimum number of cells per gene set to 5, the maximum mitochondrial percentage per cell set to 0.5, and you want to compute 30 principal components instead of 20? This is all done in five clicks:
Default options provided within any tool, be it FeatureCounts, STAR, or Seurat, can be easily adjusted via Basepair’s user-friendly GUI.
Hello beautiful cluster visualizations, goodbye endless email threads.
“The clusters don’t seem to fit my expected results. Can you increase the cluster resolution of this figure?” *cue 5 more clarifying emails before a figure is created* “Hmm, actually, can you increase the resolution again?” *repeat process*
If that sounds painfully familiar and you’d like to reduce the length of your email threads, keep reading. First, though, if you haven’t seen what a single cell report looks like on Basepair, take a look at the video below.
QC parameters, filtering and alignment metrics. Cell barcode and UMI counts and mitochondrial proportions, both before and after filtering. UMAP, t-SNE, and PCA plots. Projection of cells by UMI counts, violin plots, expression metrics. A heatmap. The single cell report on Basepair is comprehensive. And if you want even more data, you can download all the files generated during the analysis under the “Info” tab.
As you can see in the video, it takes all of 3 seconds to generate a UMAP plot with 10 clusters instead of 9. And all of 4 seconds — really, we timed it — to visualize in which clusters the gene S100A9 is expressed. No more back-and-forth emails. Your collaborator can generate these plots directly on Basepair! This is a terrific time saver for everyone involved.
Wow, this is cool. How do I get access?
You can access our single cell pipelines by creating a free trial account here: https://app.basepairtech.com. New users can upload and analyze 6 samples at no cost.
It’s really exciting to see single cell analysis taking genomics projects to the next level, and Basepair is working hard to make sure the tools for collaboration and visualization are easily available and accessible to researchers. We’re reading up on the latest research and regularly iterating our single cell pipelines to ensure we maintain the latest versions of the most popular open source tools.
I work for a sequencing core/pharma company. I need more than 6 samples to trial Basepair.
We’d love to hear more about your project and how Basepair could help make single cell RNA-seq analysis easier for your team. Get in touch with us by emailing firstname.lastname@example.org.