Hepatitis C virus

Project is already public.

The Hepatitis C virus belongs to the Flaviviridae family, and is one of the five featured viruses in the VIPR database (www.viprbrc.org). In order to try to detect positively selected amino acid sites (those sites visible to the immune system, for instance) at the 10 mature proteins, for each gene, all available nucleotide sequences were downloaded from the VIPR database, namely 2182, 2182, 2182, 2181, 2207, 2220, 2220, 2220, 2219 and 2220 for genes C, E1, E2, p7, NS2, NS3, NS4A, NS4B, NS5A and NS5B, respectively. Two protocols for sequence filtering, namely the removal of identical nucleotide sequences and the removal of identical amino acid sequences, were seperately implemented. In the first protocol (N prefix), with the removal of identical nucleotide sequences, as well as those with ambiguous nucleotides and those presenting in-frame stop codons, we ended up with 1760, 1778, 1738, 1840, 1807, 1714, 1814, 1785, 1748 and 1683 sequences for genes C, E1, E2, p7, NS2, NS3, NS4A, NS4B, NS5A and NS5B, respectively. In the second protocol (A prefix), with the removal of identical amino acid sequences, as well as the untranslated sequences with ambiguous nucleotides and those presenting in-frame stop codons, we ended up with 974, 1732, 1713, 1246, 1754, 1676, 559, 1589, 1703 and 1643 sequences for genes C, E1, E2, p7, NS2, NS3, NS4A, NS4B, NS5A and NS5B, respectively. The phi test for recombination, as implemented in SplitsTree was used to try to find evidence for recombination in these datasets, that could affect the results by creating false positively selected amino acid sites, when using codeML. Using the first protocol final dataset, the phi test did not find statistically significant evidence for recombination (P>0.05) for genes C, E1, E2, p7, NS3, NS4B, NS5A and NS5B, and did find evidence for recombination (P<0.05) for gene NS2. Using the second protocol final dataset, the phi test did not find statistically significant evidence for recombination (P>0.05) for genes C, E1, E2, NS3, NS4B, NS5A and NS5B, and did find evidence for recombination (P<0.05) for genes NS2. In both protocols, the NS4A gene sequence had too few informative characters and the Phi Test could not be used. Also, in the second protocol only, the pr gene sequence had too few informative characters to be used for the Phi test. Even so, the evidence for recombination (P<0.05) aproach was chosen for these genes. When more than 100 sequences are available for a given gene, and no evidence for recombination was found, five datasets with 50 randomly selected sequences were analyzed. In this case, sequences were aligned using Muscle, phylogenies inferred using MrBayes, and positively selected amino acid sites inferred using codeML as implemented in ADOPS. When more than 100 sequences are available for a given gene, and evidence for recombination or a few informative character case were found, five datasets with 50 randomly selected sequences were analyzed using OmegaMap, after aligning the sequences with Muscle, and a phylogeny inferred with MrBayes, as implemented in ADOPS. In this case, the details of the OmegaMap run are shown in the Notes tab of the corresponding ADOPS project, but positively selected amino acid sites can still be viewed in the PSS tab. As usual, the details of every project can be checked by opening the other tabs.