Whether or not Harry S. Truman was the originator of this is in dispute. Regardless, it comes to mind to me today because I’m facing a professional conundrum. Specifically, my desire for transparency and collaboration in science has come into conflict with my perception of how best to build my career.
A bit about my geekiness
I like working with data. I’m the kind of guy who likes learning keyboard shortcuts. I can sit in front of Excel sheets for hours, and have an absolute blast. I’ve always enjoyed the stats courses I’ve taken. In my “free time”, I’ve participated in a handful of Coursera courses on data analysis and management. At work, I’ve slowly weaned myself away from using Graphpad or SPSS to do my inferential statistics, and instead have implemented R for the purpose. It has been a fun, though slow, process.
In the last year, my research group has begun to focus on studying how the circadian rhythm interacts with metabolic health. I am not directly involved in any of these research projects, but in my “free time”, I’ve gone about constructing some tools to use within R in order to analyze circadian data. Specifically, I wanted to create a way to take a set of circadian data (response ~ timepoint in hours) and generate a model which could predict responses at any other timepoint. Furthermore, I wanted to create a way to describe exactly how the circadian rhythms of two or more groups differed from one another in terms of their amplitude, baseline response, period length, and time at acrophase.
Ultimately, I developed a few tools to achieve these specific goals.
I must emphasize that there are major limitations to the tools I developed. Perhaps the biggest is that these tools are only appropriate if the circadian response data is not only periodic, but also sinusoidal. Furthermore, I am not a circadian physiologist, nor a bioinformatician, nor a computer scientist, nor a statistician, so the tools I have developed may be “reinventions of the wheel” or (hopefully not) woefully misguided.
Nonetheless, I’ve been able to recently use some of these tools with some data generated by a colleague, and it seems like there may be real value in what I have developed. Perhaps the tools I have worked on can actually help my colleagues (or others) analyze their own circadian data. Perhaps what I have done is more useful than a mental exercise for my own self-indulgence.
When asked by my colleague if he could distribute my R script to others, my first response was, “of course, and I hope they find it useful”.
The internal conflict emerges
Later, an anxious feeling emerged. You see, in our career, you’re only worth the lines on your CV. Getting “scooped” by other researchers is a real threat to one’s career trajectory. Having a unique set of data, skills, or tools makes you highly valuable. In economic terms, limiting the supply increases the demand. Suddenly, I felt like I may be on the supply side of this relationship.
So a conundrum has emerged: do I give away these tools I have worked on, because my philosophical leanings tilt in that direction, or do I keep them guarded in order to somehow use them as bargaining chips?
Beyond the moral dilemma, how to get out of my own way?
In addition to the moral dilemma, there is another reason I have hesitated to distribute my work. I tend towards perfectionism. I am hesitant to share my work with others before it is so well-developed that it is impenetrable. When it comes to the tools I have developed within R, a final product would be plug-and-play, meaning biologists could use my script for their own data without committing much proverbial activation energy.
In these regards, my code is actually quite nascent, and it will likely never reach a “plug-and-play” level. It would be immodest of me to dream that my contributions could ever compare to those made by individuals who have literally built their careers in these fields (like the core R contributors, Hadley Wickham, and the R bloggers).
However, another Harry S. Truman (maybe!) quote has helped me make my final decision:
“Imperfect action is better than perfect inaction.”
Ultimately, I have decided to go public with the tools I have created. I hope my colleague shares them with others, and that those people improve upon what I have started. I fully recognize that giving away my work in this manner can negatively impact me by ‘reducing the supply side’ of the equation, or by exposing the ‘chinks in my armor’ since these tools are not completely developed. However, the potential benefit to others far outweighs my reservations.
Thus, today I am inviting anyone and everyone to use, modify, and improve a family of functions I have developed for use within R for the analysis of biological data. You can find the script of functions on Github at this site.
Most, if not all, of the functions I have created have dependencies on other libraries. A few which I know are central to the functions I have created include tidyr, ggplot2, stringr, ez, lawstat, effsize, devEMF, psych, and openxlsx. There are many other packages which my script automatically loads (if they are pre-installed) but may not be necessary for the particular functions which may be useful for you.
Major caveats to using these tools
I have been trained as an exercise physiologist. My professional and personal interests have taken me into R, but I have not been formally trained to develop analytical tools for R. Thus, you should use any and all of my functions with caution.
What can these tools do?
If you have circadian response data, using the “fx.find.circadian.parameters” and “fx.predicted.response.from.parameters” functions can give you predictive models in the form of:
Response = Amplitude * cos ( (2 * pi / Period) * (Hour – Acrophase ) ) + Baseline
or (equivalently, see here)
Response = A * cos (B x + C) + D
Appropriate use of the “fx…ezANOVA” or “fx.Pairwise….Tests” functions will yield .csv or .xlsx files giving you the results from inferential statistics tests applied to your data. Importantly, these tests are adjusted for violations of test assumptions when possible, but they are completely aborted if the assumptions of the inferential tests are not sufficiently met. The ANOVA functions heavily rely upon the ezANOVA function from the ez package.
There are also functions for the Friedman and Kruskal-Wallis tests which behave in a similar manner. I would also point you to this text I generated which lists URLs to check the assumptions of various inferential statistics tests, most of which point to sections of the Laerd Statistics site.
I also created a few functions to help manipulate “character” objects and data frames within R, but other packages may be more suited to those ends.
I invite anyone and everyone to explore the use of the functions I have created. I hope that others can benefit from their usage and improve upon them. Furthermore, I strongly encourage others to contact me directly in order to improve my functions so that we can work together to implement best practices when it comes to data analysis.
While it has been a small personal struggle to decide to make my work publicly available, I am confident that this is the right decision. I was initially scared to give away my work, but yet another quote by Harry S. Truman has helped to moderate my own reservations:
“America was not built on fear. America was built on courage, on imagination and an unbeatable determination to do the job at hand.”
In today’s environment, I can think of nothing more important than a willingness to set presumptions and fears aside in order to be transparent and collaborative.
My contributions are minimal, and they are likely to be replete with flaws, but keeping them to myself with the hopes that I can somehow leverage them for my own advancement is counter to my larger aim, which is to improve the world around me.