Written on 11/11/11 at 1:11 PM San Diego, Time; 11:11 PM, Cairo Time

Continuing the superstitious trend of taking advantage of this year’s peculiar two digits (11) [Both Egybio website and this blog started on 1/1/11], I have set the release date for PharmacoMicrobiomics—which will hopefully grow into a drug-microbiome knowledge base in the near future—at 11/11/11. Fortunately enough, we did meet the deadline (before 11:11 AM Cairo Time).

We launched an early version based on Google Docs last year, and we just replaced it on Nov 1 by a database, fully developed by Mariam Rizkallah, who did a phenomenal job learning information technology and getting a diploma in less than a year—not just getting the diploma, but actually using it with a combination of MySQL, Django, and Python to get the final product.

Besides Mariam, other students (most notably Rama Saad) have curated literature and collected enough data to get us started.

PharmacoMicrobiomics qualifies for the first complete product of my “virtual lab.” I would simply define my virtual laboratory as one with no physical location and whose members I have to supervise remotely. Many investigators supervise their labs remotely some of the time, but this project has been mostly performed by students located in Egypt while I was more than 80% of the time out of Egypt and was following up with them almost 95% of the time via the Internet. It is certainly a good working model for computational/in silico biology and bioinformatics projects. Can’t promise it could work for wet lab projects.


Write as if you were blogging! Or perish?

Warning: This is a ranting post, but it is not negative, I promise! It’s more like the daydreams of a frustrated author, with some futuristic aspirations (not so futuristic or ambitious for some). The post is partly inspired by “Beyond the PDF” Workshop.

I’m currently working on a manuscript (no surprise. Isn’t this what I’m doing most of the year. “Scientists are forced to write more than they read” Phil Bourne, 1/20/2011-Beyond the PDF workshop) that I absolutely need to finish by February 1, and I have been struggling for the past 5-6 months with the flow of that manuscript.

As of today, the Introduction section is almost done; the methods are written as well; the Results section seems to be complete and the discussion is outlined. However, every time I go through the results (and this applies to the past 3 months), I find something I don’t like, I start a new analysis, then I re-arrange the data, consequently rearranging the text, which leaves scars and bumpy transitions, etc.

One problem I’m having is the massive amounts of data and the uninterpretable nature of many of them. If I could just put some of the data as they are and put a dialogue box below them saying: I’m helpless here, this is open to public debate; suggested interpretations are welcome #crowdsourcing . Why not? That would be idea. Why would authors hide “unnecessary data” or data they have no clue about just because they cannot write anything about them, simply putting the cliché “data not shown?” Perhaps 2-3 weeks after publication, another paper will come out and let everybody be able to discuss this piece of data, still in the context of the published manuscript. Or maybe a reader from a distant discipline (e.g., astrophysics) will look at the uninterpretable figure or table and smile saying: Oh no! This is so obvious. It means so and so #Itoldyouso

The other problem that has been frustrating me even more in the past couple of weeks is the organization and somehow the writing style. If I could just write it as if it were a blog post! It would have been much easier. What does this mean?

  • Write as if I’m telling a story. Use plain active voice. Use some informal expressions and not regret it. Just not be refrained by the format too much to the extent of being crippled. I usually finish an average-length blog post in a couple of hours or less. If I write an article this way, it can take a week rather than 3-6 months.
  • Link to things I’m talking about. Quote already written things from wherever they are written, and not bothering to rephrase their statements yet not claiming any credit for finding the quotes.
  • “Tag” people while writing. They may be people that can help interpreting that paragraph better. They may be cited authors who are alive and have online accounts.
  • Leave some questions unanswered. Some sentences unfinish… (hmmm?)

During one of Beyond The PDF sessions, in response to Anita de Waard’s analysis, Phil Bourne expressed some “regret” on revising one of his students’ articles and enforcing some changes to make the manuscript look more “professional.” He was wondering whether he should have let the student express excitement about the findings in an informal way. A tricky question, isn’t it?

I am not sure if writing liberally is a good solution. It’s easier, more direct,

On the other hand, the article is not just a story. Those who made the IMRaD/IRDaM systems made them for a reason. Some readers approach the article just to find the reagents used in the experiments. They want the methodology separate and clear. Another thing, think of readers from non English-speaking countries. They may not necessarily get the subtle expressions, laugh at the jokes, or understand the slang without a handy copy of The Urban Dictionary.

Bottom line: I need to close my Word doc right now, open a new one, write as if I’m blogging the full story in one page, non-stop. Then, I’ll take that back, project it on my already written Results section, translate spoken jargon into scientific jargon, and whatever will not fit in my blogged story will need to adapt or perish!

BBU: Beautiful but Uninterpretable

PCA of phage in metagenomes

PCA of different metrics describing known phage-like sequences in metagenomes. Colors represent different phage classes.

PCA of dsDNA Phages in metagenomes

PCA of different metrics describing known dsDNA phage-like sequences in metagenomes. Colors represent different phage classes.

PCA of phage abundance/distribution in metagenomes

PCA of different metrics describing known dsDNA phage-like sequences in metagenomes. Each data point represents a metagenomic sample. Colors represent different metagenome habitats; x viral metagenomes; o microbial metagenomes.

I have been generating then staring at these, and another two dozen graphs, for over three days now. They are two-dimensional plots representing principal component analyses (PCA) of multiple calculated variables in an attempt to find patterns that differentiate phages according to their classes or their hosts’ phylogeny, or patterns that differentiate natural habitats according to their phage-like sequences.

Yes, all look beautiful to me (and, yes, I know that beauty is in the eye of the beholder—no need to remind me); but it is really hard to extract some information from these PCA analyses and generate usable data from them that can tell a good story.

In the wet lab, you may take hundreds of gel pictures, record thousands of time points, plot a huge number of graphs, or count millions of cells under the microscope, then never use these data. But, somehow you have some “concrete” material (pictures scanned or pasted in your notebook, recorded numbers, plots stored in folders, etc.) However, in front of these graphs I feel so vulnerable. There are infinite possibilities. These days, there is a lot of talk among scholars and publishers about sharing data, storing data, attaching raw data to publications; but I doubt that what they talk about includes all intermediary steps of different plots, all “gated” views of flow cytograms, or all calculations performed “on the fly” until a reasonable, stable, and final product is reached that can be presentable. These “data intermediates” are simply too numerous to be recorded (TNTR?); yet, they remain beautiful but uninterpretable!

BioGnosis… A blog born on 1/1/11

The last thing I want at this busy time and this stage of my career is another blog! I have had many blogs since I learned the existence of the word in July 2003.

However, most of the blogs I created are/were not science-oriented and were certainly not related to my work or career plans. They were rather personal, literary, related to sociopolitical and contemporary issues in Egypt, or—my favorite—critiquing the press and media in Egypt and elsewhere.

Well, I have started a couple of science blogs too (Lost in Annotations, and A Novice in the World of Bioinformatics), then I started my Microbiology Blog as I restarted teaching in Cairo University in 2006-2007. However, these were fragmented, discontinuous attempts, and none of them allowed me to freely discuss my research or comment on some of my favorite topics: communicating science, scholarly publishing, open access, open science, etc.

OK. I’m obviously trying too hard to justify (mostly to myself) why on Earth I need to start another blog, but I’m sure that the readers don’t care much about this self-dialogue. So, without further ado, here is Bio Gnosis… A blog about decoding life (at all levels but mostly at its most common “microscopic” scale) and sharing & communicating the exciting decoded knowledge, in all possible forms and formats.

I have to confess that I’m not planning to heavily blog here—at least now. I just feel there is no better day to start my new (and hopefully long-term) blog other than 1/1/11… Yes, I have my own superstitions of course!

