Interview with Jimmy Lee by Kelvin Kwok
Prior to his career as a computational postdoc fellow at the Wellcome Sanger Institute at Cambridge, UK, Dr Jimmy Lee spent his entire PhD at the University of Hong Kong working on mouse models and cell culture systems. Techniques like cloning, Western blotting and PCR were almost becoming his daily routines forever. But he made a sharp turn.
Currently a data scientist involved in the Human Cell Atlas project, Jimmy’s main job is to develop user-friendly tools which assist not only bioinformaticians, but also pure wet lab scientists to analyse their single-cell sequencing datasets.
“I am always excited to see how large datasets look before and after data polishing.” This is one of the many joyful moments Jimmy loves, but had no luck to experience back in his previous lab. Now working on data with such high dimensionality, he enjoys spending time to explore various coding strategies to illustrate data in creative ways. What fascinates him the most, though, is to design, execute and optimise machine learning software to achieve his research goals.
Since embarking on his PhD almost 7 years ago, Jimmy had realized that high throughput technology would become a trend in biomedical research. Over the past few years, more researchers in the wet lab are also paying increasing attention to the importance of bioinformatics in their own work.
After graduating, Jimmy moved to Cambridge to join his wife, who works at the same institute. “There was still a big gap in knowledge and communication between pure biologists and bioinformaticians here at Cambridge, despite it being one of the most resourceful pioneers in research transformation.” This sparked his determination to contribute his wet lab experience to computer science, specifically by developing machine learning tools for biomedical research and teaching bioinformatics courses at the University of Cambridge.
Transitions are never smooth. Jimmy struggled with computational jargon that gets juggled all over the place in daily conversations with colleagues. Apart from working hard, Jimmy’s tip is to “feel shameless asking for explanations for terms and concepts that you don’t understand”.
“There is no magic, otherwise it should be renamed as BIOINFORMAGIC.”
But this whole new world seems incredibly challenging for biologists without any programming experience. “Drop your pipette and code your first ‘Hello World’ script now!” Jimmy thinks it can never be “too late” to start, even if you are just curious about bioinformatics and want to learn the basics of it.
However, one should not expect to master programming miraculously just by sitting in front of an online video course for ten hours straight. Getting his hands dirty while diving into the codes and real exercises was how Jimmy essentially self-learned the fundamentals of several programming languages.
“You are really just waiting for the ‘zombie apocalypse’ to give you a push.”
Organizing high dimensional data to answer scientific questions may sound like rocket science or dark magic, but Jimmy suggests to think of it like an elegant Excel task.
Using the ‘zombie apocalypse’ analogy, he explained that working on big datasets is like running into a library while fleeing from zombies in the town. In the library, all books are thrown into a messy pile, but fortunately you can still use the robotic librarians to help you find useful books to generate an antidote. You first teach the librarians to start going through the books using certain sorting criteria, or an ‘algorithm’. This algorithm needs to be rationally designed and adjusted according to your specific needs for the antidote, and has to be robust so that other victims coming into the library later can also swiftly find what they need. Once all the books needed for the antidote are ready, you arrange them in a way that only the useful pages are visible, logically and simultaneously, to your Antidote Squad, who waste no time to save the world.
“That’s what everyday bioinformatics is actually doing – data cleaning, data restructuring, algorithm selecting and graph plotting.” When working on simple data, one can do things by hand at a jolly pace, but if the data size is overwhelmingly large, the zombie pressure becomes real.
“Of course,” reminds Jimmy, “you still need mathematical and statistical knowledge to understand the concepts of various models when it comes to decision making.” On the other hand, biological terms and concepts are always the common language regardless of dry or wet lab. So instead of worrying about losing your biologist identity, one should really walk out of the comfort zone before the ‘zombie apocalypse’ dawns on you unexpectedly.
How should one start? “Learning programming languages and statistical concepts has never been as easy as now. High quality data, crash courses and tutorials are everywhere and freely accessible.” Jimmy also recommended Stack Overflow and Reddit for beginners. The former allows you to freely ask stupid questions about coding, and the latter is the best sharing community for the latest trends in data science field.
“So, whenever you are thinking of the timing for the transition, it is NOW.”
Feature photo from pixabay.com