Today, 75 years ago on May 8, 1945, World War Two ended in the European theatre with the unconditional surrender of Nazi Germany. “The instrument of surrender signed 7 May 1945 stipulated that all hostilities must cease at 23:01 (CET), 8 May 1945, just an hour before midnight.” However, since that was already past midnight, EET and Moscow time, the USSR and its satellite states marked VE Day on May 9, and Russia does so to this day. In Israel the day is unofficially marked on May 9, owing to the large number of elderly Russian immigrants who had actually fought in what Russians call “The Great Patriotic War”.
Meanwhile, some happenings on the COVID front.
(1) (Hat tip: masgramondou.) An experienced software engineer, formerly at Google, reviews Neil Ferguson’s simulation code in detail.Yes, the one that predicted two million dead in the US, which later had to be revised downward by a factor of twenty.. Read the whole thing, and weep. A few teasers:
My background. I wrote software for 30 years. I worked at Google between 2006 and 2014, where I was a senior software engineer working on Maps, Gmail and account security. I spent the last five years at a US/UK firm where I designed the company’s database product, amongst other jobs and projects. I was also an independent consultant for a couple of years. Obviously I’m giving only my own professional opinion and not speaking for my current employer.
The documentation says: “The model is stochastic. Multiple runs with different seeds should be undertaken to see average behaviour.” “Stochastic” is just a scientific-sounding word for “random”. That’s not a problem if the randomness is intentional pseudo-randomness, i.e. the randomness is derived from a starting “seed” which is iterated to produce the random numbers. Such randomness is often used in Monte Carlo techniques. It’s safe because the seed can be recorded and the same (pseudo-)random numbers produced from it in future. Any kid who’s played Minecraft is familiar with pseudo-randomness because Minecraft gives you the seeds it uses to generate the random worlds, so by sharing seeds you can share worlds.
Clearly, the documentation wants us to think that, given a starting seed, the model will always produce the same results.
Investigation reveals the truth: the code produces critically different results, even for identical starting seeds and parameters.
I’ll illustrate with a few bugs. In issue 116 a UK “red team” at Edinburgh University reports that they tried to use a mode that stores data tables in a more efficient format for faster loading, and discovered – to their surprise – that the resulting predictions varied by around 80,000 deaths after 80 days[…]
(2) “It’s not peer-reviewed!” You hear a lot in debates about COVID19 nowadays. But what does this really mean?
For a scientific paper to get published in a reputable scientific journal, it needs to undergo peer review: the editor (or an associate/section editor) sends the submitted paper out to (usually between two and four) experts in the field for their frank evaluation of the science. They write verbal reports, passed back anonymously to the author, and may also answer a questionnaire grading the paper on various criteria (novelty, technical correctness, quality of presentation, appropriate length,…). They also make a summary recommendation which is one of the following
- Publish as is (rarely do all reviewers recommend this on 1st pass)
- Publish subject to minor revisions detailed in the report. (Further review is typically not expected.)
- May be publishable subject to major revision (and usually re-reviewing of the revised manuscript).
- Not suitable for the journal, but may be publishable in _____
- Not suitable for publication in any form
Where does one draw the line between “minor” and “major” revision? In practice, if (nontrivial amounts of) additional experiments/computer simulations/… are required, or if the interpretation needs to be radically overhauled, it’s considered “major”, otherwise minor. One round of the process easily takes a month or more, doubled if one or more reviewers insist on major revision, or if the paper is initially rejected and resubmitted to another journal. In fast-moving research areas (not just the present global pandemic), this causes frustrating delays. So sometime in the 1990s, when the web was still in its infancy, a group of particle physicists developed an online preprint server that, after a period under the rather confusing URL xxx.lanl.gov (which suggested a sideline of Los Alamos National Laboratory into adult entertainment), became known as arXiv.org. Here scientists could share their freshly submitted manuscripts with colleagues ahead of publication, or even circulate drafts. Anybody wishing to comment on such a “preprint” could just email the author.Over time, similar sites came online for the life sciences (biorxiv.org), medicine (MedRxiv.org) and finally chemistry (chemrxiv.org). Sure, there are spam and crank submissions to these sites (site managers try to keep out the obvious ones), but for the most part, submissions are legitimate papers in their original, pre-peer reviewed, form. Many of them, if the journal (publisher) allows this, update their submission with a “postprint”, i.e., the revised manuscript after peer review. (arxiv.org and similar sites are set up such that the original and revised uploads are always preserved and accessible, to forestall a “Oceania is not at war with Eurasia” scenario.) Many journals nowadays, once a paper is accepted for publication, immediately put the accepted manuscript “postprint” online, and in priority disputes this date counts as the date of first publication.
Copy editing by the production staff, typesetting in journal format, proofreading by the author (often with some last-minute changes) may take several weeks more, after which the final “version of record” comes online, often at first with placeholder page numbers ahead of inclusion in a journal issue. (No further change is made after this other than updating the placeholder page numbers to the final ones upon inclusion in an issue. If the authors find a mistake in their own paper at this point, their only option is to publish an erratum.)
Peer review is definitely valuable, and there may be substantial changes between an online preprint and the version of record — but that does not necessarily mean the preprint is worthless, especially if it comes from an established research group, in which case it’s best regarded as a “beta release” — some changes may be expected, but the paper may already be quite useful. The anonymous peer review system has its own issues with bias and (both benign and malignant) “gatekeeping”, but for the most part has served the scientific community well. Its primary weakness at this point is that qualified reviewers become over-burdened with manuscripts to review — keep in mind this is unpaid service to the scientific community, and reviewers quickly learn not to respond too fast, or they get “rewarded” with more refereeing requests. And after all, you need to perform, manage, and publish your own research, aside from teaching and any administrative duties you might have.
Alternatives have been sought. Public open peer review is one of them, where the reviewers’ reports and critiques are visible online. This could potentially become a hybrid alternative to both the preprint system and anonymous peer review, with radical transparency to the reader. In the discussion on the community testing effort in Santa Clara County, we saw an interesting example.