sure, i'm attaching the patch created from a fresh checkout of the trunk this morning. in principle, all required bits are there and it builds and checks without errors and warnings.

cheers,

robert.

On 02/10/2015 07:37 PM, Valerie Obenchain wrote:
Hi Robert,

This sounds like a good addition. I'll put it on the TODO. If you need
this immediately I'd be happy to accept a patch (with unit tests).

Valerie



On 02/10/2015 06:29 AM, Robert Castelo wrote:
hi,

in the VariantAnnotation package, the help of the functions for
identifying variant types such as SNVs, insertions,
deletions, transitions, and structural rearrangements gives the
following definitions:


• isSNV: Reference and alternate alleles are both a single
nucleotide long.

• isInsertion: Reference allele is a single nucleotide and the
alternate allele is greater (longer) than a single nucleotide
and the first nucleotide of the alternate allele matches the
reference.

• isDeletion: Alternate allele is a single nucleotide and the
reference allele is greater (longer) than a single nucleotide
and the first nucleotide of the reference allele matches the
alternate.

• isIndel: The variant is either a deletion or insertion as
determined by ‘isDeletion’ and ‘isInsertion’.

• isSubstition: Reference and alternate alleles are the same
length (1 or more nucleotides long).

• isTransition: Reference and alternate alleles are both a
single nucleotide long. The reference-alternate pair
interchange is of either two-ring purines (A <-> G) or
one-ring pyrimidines (C <-> T).


however, unless I'm missing something here, these definitions do not
cover the indels that involve the the insertion or deletion involving
more than one, respectively, reference or alternate nucleotide. this
could be an example of what i'm trying to say:

library(VariantAnnotation)

vr <- VRanges(seqnames = rep("chr1", times=5),
ranges = IRanges(seq(1, 10, by=20),
seq(1, 10, by=20)+c(1, 1, 2, 2, 3)),
ref = c("T", "A", "A", "AC", "AC"),
alt = c("C", "T", "AC", "AT", "ACC"),
refDepth = c(5, 10, 5, 10, 5),
altDepth = c(7, 6, 7, 6, 7),
totalDepth = c(12, 17, 12, 17, 12),
sampleNames = letters[1:5])

isSNV(vr)
## [1] TRUE TRUE FALSE FALSE FALSE
isIndel(vr)
## [1] FALSE FALSE TRUE FALSE FALSE
isSubstitution(vr)
## [1] TRUE TRUE FALSE TRUE FALSE

note that the last variant does not evaluate as true for any of the
three possibilities. after looking for variant definitions, i have found
that the Human Genome Variation Society (HGVS) describes this as a
deletion followed by an insertion and calls it "indel" or delins" (it's
unclear to me whether they use that interchangeably), see the link here:

http://www.hgvs.org/mutnomen/recs-DNA.html#indel

the only other site I could quickly find with Google, where some
specific definition is given is the site of the software SnpEff, which
calls it "MIXED", a "Multiple-nucleotide and an InDel":

http://snpeff.sourceforge.net/SnpEff_manual.html

I would suggest that VariantAnnotation should try to identify this type
of variant. following the HGVS recommendations, could we maybe have a
function for it called isDelins() ??



cheers,

robert.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to