hi,

in the VariantAnnotation package, the help of the functions for identifying variant types such as SNVs, insertions, deletions, transitions, and structural rearrangements gives the following definitions:


        • isSNV: Reference and alternate alleles are both a single
          nucleotide long.

        • isInsertion: Reference allele is a single nucleotide and the
          alternate allele is greater (longer) than a single nucleotide
          and the first nucleotide of the alternate allele matches the
          reference.

        • isDeletion: Alternate allele is a single nucleotide and the
          reference allele is greater (longer) than a single nucleotide
          and the first nucleotide of the reference allele matches the
          alternate.

        • isIndel: The variant is either a deletion or insertion as
          determined by ‘isDeletion’ and ‘isInsertion’.

        • isSubstition: Reference and alternate alleles are the same
          length (1 or more nucleotides long).

        • isTransition: Reference and alternate alleles are both a
          single nucleotide long.  The reference-alternate pair
          interchange is of either two-ring purines (A <-> G) or
          one-ring pyrimidines (C <-> T).


however, unless I'm missing something here, these definitions do not cover the indels that involve the the insertion or deletion involving more than one, respectively, reference or alternate nucleotide. this could be an example of what i'm trying to say:

library(VariantAnnotation)

vr <- VRanges(seqnames = rep("chr1", times=5),
              ranges = IRanges(seq(1, 10, by=20),
                               seq(1, 10, by=20)+c(1, 1, 2, 2, 3)),
              ref = c("T", "A",  "A", "AC",  "AC"),
              alt = c("C", "T", "AC", "AT", "ACC"),
              refDepth = c(5, 10, 5, 10, 5),
              altDepth = c(7, 6, 7, 6, 7),
              totalDepth = c(12, 17, 12, 17, 12),
              sampleNames = letters[1:5])

isSNV(vr)
## [1]  TRUE  TRUE FALSE FALSE FALSE
isIndel(vr)
## [1] FALSE FALSE  TRUE FALSE FALSE
isSubstitution(vr)
## [1]  TRUE  TRUE FALSE  TRUE FALSE

note that the last variant does not evaluate as true for any of the three possibilities. after looking for variant definitions, i have found that the Human Genome Variation Society (HGVS) describes this as a deletion followed by an insertion and calls it "indel" or delins" (it's unclear to me whether they use that interchangeably), see the link here:

http://www.hgvs.org/mutnomen/recs-DNA.html#indel

the only other site I could quickly find with Google, where some specific definition is given is the site of the software SnpEff, which calls it "MIXED", a "Multiple-nucleotide and an InDel":

http://snpeff.sourceforge.net/SnpEff_manual.html

I would suggest that VariantAnnotation should try to identify this type of variant. following the HGVS recommendations, could we maybe have a function for it called isDelins() ??



cheers,

robert.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to