#IAmAResearchParasite … for my kind of data

So, let me explain the history of this Twitter hashtag. I’ve included the online publishing dates to better understand the timeline.

It started from an editorial in New England Journal of Medicine (20 Jan 2016). It is in favour of sharing data from interventional clinical trials, with the chief reason cited is ethical obligation to patients. A second editorial by different authors  (21 Jan 2016) then raises two main concerns about data sharing: (1) “Someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters.” (2) “a new class of research person will emerge … research parasites”. Evidently ‘parasites’ is not a very nice word to call a group of people and it is especially controversial because the authors are the editor-in-chief Jeffrey M. Drazen and deputy editor Dan L. Longo so their view might well be conflated with that of the journal. Drazen wrote another editorial (25 Jan 2016) to clarify that the journal itself is “committed to data sharing” but highlights the issue that the clinical trialists are not in the best relationships with the data scientists, so much so that it impedes data sharing.

But the Twitter-sphere and the online scientific communities were already enraged by the name-calling and took to call themselves ‘parasites’ as a badge of honour, thus #IAmAResearchParasite came to be. This also includes even the editor-in-chief of Science, Marcia McNutt (#IAmAResearchParasite, 4 Mar 2016). When I first heard of this discussion, my knee-jerk reaction was to be on this side too.

Mulling over it for two weeks though and hearing more opinions,  I have arrived to the conclusion that it’s complicated (true for Facebook relationship statuses and lots of other things). I will attempt to present both sides in as neutral manner as possible. I have actually tried to write in normal argumentative paragraphs, but I feel it is not very fluid medium, so I shall attempt instead (kind of) Socratic method:

A: I am a clinical trialist. The clinical trial data is a large and complex dataset and acquired over many years. I am reluctant to share my data with other scientists.

B: I am a scientist. For science to progress, it is imperative to let your data open under scrutiny, isn’t it? You might miss other analyses or you might make a mistake in the analysis.

A: Yes, ideally that is the most science-y thing to do. But it just seems that they have it easy to just readily harvest the fruit of our labour that we have put so much effort in.

B: I understand your concern. Our current citation and acknowledgement system is not ideal yet, but with proper credit attributed to you and others who collect the data, isn’t it fine?

A: I suppose that will do. I have another concern. I designed the clinical trials and I have reasons for collecting a set of data and not the others, how would other scientists analyse our raw data without understanding the context?

B: This seems to be communication problem. I think it is only right to include you in their re-analysis process, to make you a collaborator or a co-author or suchlike.

A: I will be willingly sharing my data without reservations if what you have talked about are already in place. But they only consider their own interests.

B: As do you.

A: As does everybody, I have to admit, yes.

Anyway, that’s the dialogue I imagined in my head. Oversimplified, but I think it has the nuances and neutrality (not to mention civility). Forbes’ David Shaywitz wrote a very balanced series of articles about this issue, and it very much informed and influenced my current understanding, so do read it (Part 1, Part 2, Part 3). I especially recommend Part 3, where Shaywitz has an interesting discussion about the separation of data sharing and data analysis and reminds the scientists that the patient is at the centre of the data so any research and what is to be done with the data (including data sharing) should be in her best interest.

If I may add my opinion, and I haven’t seen this discussed at length in any of the above linked articles, is that we also have to consider the nature of the data. As my Socratic A mentioned, clinical trial data is a unique dataset, both large and complex. If we take the Science editor-in-chief, Marcia McNutt’s response, for example, she wrote about her own experience with oceanographic data. My question is, are we comparing apple to orange in comparing clinical trial data with oceanographic data?  In my own field, for example, I do use 3D coordinates deposited in public repository by the structural biologists and my type of data does not leave room for much ambiguity. Maybe there is ambiguity in certain conformations of several aromatic rings here and there, but if it is in good resolution, there is not much room for drastic alternative interpretations. The same cannot be said for clinical trials data. A large number of variables is involved and different statistical techniques may give different interpretations, and so on.

Maybe this is why the scientific communities are so enraged: because they feel the issue is about data sharing in science as a whole, while a sober look shows that it is about data sharing of a specific kind of data. Data sharing is already the status quo in many fields, in oceanography and in structural biology, to name a few. It should also be so for clinical trials, but it is not yet the case, and we have to address the root causes. But in any case, at least the discussions have started.

So again, it’s complicated (not my relationship status), but I can safely say that #IAmAResearchParasite, though only for my kind of data.

