Hi Robert,

This sounds like a good addition. I'll put it on the TODO. If you need this immediately I'd be happy to accept a patch (with unit tests).

Valerie



On 02/10/2015 06:29 AM, Robert Castelo wrote:
hi,

in the VariantAnnotation package, the help of the functions for
identifying variant types such as SNVs, insertions,
deletions, transitions, and structural rearrangements gives the
following definitions:


         • isSNV: Reference and alternate alleles are both a single
           nucleotide long.

         • isInsertion: Reference allele is a single nucleotide and the
           alternate allele is greater (longer) than a single nucleotide
           and the first nucleotide of the alternate allele matches the
           reference.

         • isDeletion: Alternate allele is a single nucleotide and the
           reference allele is greater (longer) than a single nucleotide
           and the first nucleotide of the reference allele matches the
           alternate.

         • isIndel: The variant is either a deletion or insertion as
           determined by ‘isDeletion’ and ‘isInsertion’.

         • isSubstition: Reference and alternate alleles are the same
           length (1 or more nucleotides long).

         • isTransition: Reference and alternate alleles are both a
           single nucleotide long.  The reference-alternate pair
           interchange is of either two-ring purines (A <-> G) or
           one-ring pyrimidines (C <-> T).


however, unless I'm missing something here, these definitions do not
cover the indels that involve the the insertion or deletion involving
more than one, respectively, reference or alternate nucleotide. this
could be an example of what i'm trying to say:

library(VariantAnnotation)

vr <- VRanges(seqnames = rep("chr1", times=5),
               ranges = IRanges(seq(1, 10, by=20),
                                seq(1, 10, by=20)+c(1, 1, 2, 2, 3)),
               ref = c("T", "A",  "A", "AC",  "AC"),
               alt = c("C", "T", "AC", "AT", "ACC"),
               refDepth = c(5, 10, 5, 10, 5),
               altDepth = c(7, 6, 7, 6, 7),
               totalDepth = c(12, 17, 12, 17, 12),
               sampleNames = letters[1:5])

isSNV(vr)
## [1]  TRUE  TRUE FALSE FALSE FALSE
isIndel(vr)
## [1] FALSE FALSE  TRUE FALSE FALSE
isSubstitution(vr)
## [1]  TRUE  TRUE FALSE  TRUE FALSE

note that the last variant does not evaluate as true for any of the
three possibilities. after looking for variant definitions, i have found
that the Human Genome Variation Society (HGVS) describes this as a
deletion followed by an insertion and calls it "indel" or delins" (it's
unclear to me whether they use that interchangeably), see the link here:

http://www.hgvs.org/mutnomen/recs-DNA.html#indel

the only other site I could quickly find with Google, where some
specific definition is given is the site of the software SnpEff, which
calls it "MIXED", a "Multiple-nucleotide and an InDel":

http://snpeff.sourceforge.net/SnpEff_manual.html

I would suggest that VariantAnnotation should try to identify this type
of variant. following the HGVS recommendations, could we maybe have a
function for it called isDelins() ??



cheers,

robert.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to