Talk:Wuhan coronavirus outbreak

Origin of 2019-nCoV?
The genome of 2019-nCoV has been claimed to come from four different sources:
 * 1) Bat coronavirus similar to Bat-SL-CoVZXC21 or Bat_SL-CoVZC45
 * 2) Spike glycoprotein gene from human SARS
 * 3) pShuttle-SN vector used in labs for splicing the genome
 * 4) HIV inserts at the tips of the spikes in the spike glycoprotein

Points 2 and 3 are wrong!

I analyzed the two claims made about inserts. James Lyons-Weiler claims that the 2019-nCoV virus has a unique sequence about 1,378 bp (nucleotide base pairs) long that is not found in related coronaviruses. He published the sequence online. He also claims that the sequence also contains the pShuttle-SN expression vector.

I ran the sequence through an online DNA to protein translator. Reading it in reading frame 2 (i.e leaving out the first "c") gives this amino acid sequence.

SVLHSTQDLFLPFFSN VTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNV VIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLRE FVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSS GWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQ PTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTK LNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNY LYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVG

The sequence is identical to the sequence published in the now withdrawn Indian paper. It corresponds to positions 46 to 504 in the full protein (positions 50 to 508 in the alignment table). The sequence is a close match to the SARS spike protein. There is no place to fit in the pShuttle-SN sequence, unless the pShuttle-SN sequence itself mimics or contains the SARS spike protein. James Lyons-Weiler now admits that this is in fact the case.

James Lyons-Weiler also claimed that the spike glycoprotein of 2019-nCoV is most similar to the SARS spike protein and not to the SARS-like coronavirus in bats. This is not true. The article in The Lancet records a 80.2% match between the spike protein of 2019-nCoV and Bat_SL-CoVZC45 but only 76.2% between 2019-nCoV and SARS. This level of similarity is also shown in the sequence alignment presented in the Indian paper.

The Indian paper makes the mistake of comparing 2019-nCoV to SARS, when in fact it is most related to SARS-like bat virus Bat_SL-CoVZC45 or Bat_SL-CoVZXC21 or some common ancestor. This however does not invalidate their results. There are a near infinite number of current and past viruses in the wild that are ever closer ancestors to 2019-nCoV. Comparing 2019-nCoV to SARS is not fundamentally different to comparing to a bat coronavirus or to some yet to be discovered closer relative.

The paper should be rewritten by comparing 2019-nCoV to Bat_SL-CoVZC45. It would also be interesting to know if the insets match the DNA sequence of the database of known HIV genomes and not only the amino acid sequence. I would do the comparison myself, but I do not yet have access to the 2019-nCoV or Bat_SL-CoVZC45 full genomes.

The spike protein of SARS is 1255 amino acids long, nine more than in Bat_SL-CoVZC45. A transformation from SARS to 2019-nCoV removes four amino acids and adds 22 for a total of 1273. A comparison of Bat_SL-CoVZC45 to 2019-nCoV should see inserts of 31 to 35 amino acids or 93 to 105 nucleotides.

I still think the link to HIV sequences is statistically meaningful. (continued.)


 * http://www.tiem.utk.edu/~gross/bioed/webmodules/aminoacid.htm
 * https://en.wikipedia.org/wiki/HIV
 * https://en.wikipedia.org/wiki/Structure_and_genome_of_HIV
 * https://en.wikipedia.org/wiki/Envelope_glycoprotein_GP120

-- Petri Krohn (talk) 03:57, 5 February 2020 (UTC)