“I started way too fast.” - the words pulverized my spirit as I hobbled up the final climb on forest road during the 2011 Terrapin 50K near Lynchburg, Virginia. I was being passed by everyone, I was overwhelmed, and it took everything I had to put one foot in front of the other.
That day, I learned the importance of dialing in the start, and the approach to our “big data” project at Tree Star, Inc. - FlowJo Enterprise has hopefully benefitted from the hard lessons learned on a nameless forest road behind Terrapin Mountain. A year ago, we were developing some incredible tools for data analysis and comparison, but the missing piece was a strong start - we had no way of moving data from where it was acquired - at a cytometer, usually in a core lab - to these high-performance tools. A year later, and we have designed an application to do just that, and allow researchers to assign data to a pipeline for analysis at the point of acquisition.
Being faced with terabytes of data that needs to be managed, analyzed, visualized, and communicated is an incredibly daunting task. If you are taking on a project of this magnitude and thinking about the finish line of providing actionable insight - you might feel like you have just signed up to run 100 miles. It’s…. impossible, overwhelming, crazy.
I love both big data and running ultra-distance races (particularly 100 milers), and when faced with daunting projects or miles, there are some hard-earned lessons that I apply:
Have a plan...
We are familiar with the adage that "He who fails to plan is planning to fail" but we need to take this several steps further. Intensive planning and iteration is absolutely critical to allow you to relax into execution and performance. Before racing the Tahoe Rim Trail 100 this summer, I spent >10 hours drafting up an 8.5-page race plan detailing my projected times, nutrition needs, (estimated) mental state, and paradigm for each section of the course. The steps for success in taking on overwhelming projects are (1) break the whole into its component, understandable parts; (2) examine the needs and challenges for these components, (3) determine what to do if your plan fails ahead of time, and (4) pilot out as much as you can ahead of time (see below). My seminar talk and demo for our suite of data tools actually follows this advice and soberly acknowledges where the plan and tools will be challenged.
...and then beat it up.
A plan is not a static object, and it should be refined through the process of piloting and testing. I pilot every gel, sock, shirt, and drink ahead of my long ultramarathons, using shorter runs of 15-20 miles as pilot experiments to test how a new component works with the whole. For big data projects, the same rules apply. On one end, we are developing a data management tool that we have tested, tested, and tested again on its own and as part of an analysis pipeline to ensure that this new component integrates in the whole. In addition, we have a handful of pilot projects on the other end of the pipeline in data visualization, iterating through options to see what works before we let it take the starting line. (Practice how you play.)
Acknowledge that the wall is waiting for you.
Oh, is the wall ever waiting for you - but if you expect it, then you can plan for it. During the Bishop High Sierra 50 miler, I led the race for the first 13 miles of a 15 mile climb from 4000’ to over 9000’, and hit the wall so hard it stopped me. The distance, altitude, and taking the lead were overwhelming but I pushed this aside and after 2 miles of hiking I was back to running. With relentless forward progress, I recaptured the lead at 30 miles and got my first outright win in an ultramarathon. Data projects are much more overwhelming and challenging - but getting through the wall where you perceive extreme challenges and think very seriously about quitting requires the same approach. Be relentless, clear your mind, and just keep moving - on the other side you will be “doing it right” and begin challenging hypotheses rather than following them, removing selection bias, and posing new questions.
The Long Run
There are options out there for managing and analyzing cytometry data, and it is humbling and satisfying to be involved with a great team building software for the long run. To learn more, please check out our website or documentation about FlowJo Enterprise.