I have been generating then staring at these, and another two dozen graphs, for over three days now. They are two-dimensional plots representing principal component analyses (PCA) of multiple calculated variables in an attempt to find patterns that differentiate phages according to their classes or their hosts’ phylogeny, or patterns that differentiate natural habitats according to their phage-like sequences.
Yes, all look beautiful to me (and, yes, I know that beauty is in the eye of the beholder—no need to remind me); but it is really hard to extract some information from these PCA analyses and generate usable data from them that can tell a good story.
In the wet lab, you may take hundreds of gel pictures, record thousands of time points, plot a huge number of graphs, or count millions of cells under the microscope, then never use these data. But, somehow you have some “concrete” material (pictures scanned or pasted in your notebook, recorded numbers, plots stored in folders, etc.) However, in front of these graphs I feel so vulnerable. There are infinite possibilities. These days, there is a lot of talk among scholars and publishers about sharing data, storing data, attaching raw data to publications; but I doubt that what they talk about includes all intermediary steps of different plots, all “gated” views of flow cytograms, or all calculations performed “on the fly” until a reasonable, stable, and final product is reached that can be presentable. These “data intermediates” are simply too numerous to be recorded (TNTR?); yet, they remain beautiful but uninterpretable!