Thanks for the feedback. I'll look into nchar for XStringSetList.
I'm in favor of supporting isDeletion(), isInsertion(), isIndel() and isSNV() for the VCF classes and removing restrictToSNV(). I could add an argument 'all_alt' or 'all_alt_agreement' to be used with CollapsedVCF in the case where not all alternate alleles meet the criteria.
Here are the current definitions:
isDeletion <- function(x) { nchar(alt(x)) == 1L & nchar(ref(x)) > 1L & substring(ref(x), 1, 1) == alt(x) } isInsertion <- function(x) { nchar(ref(x)) == 1L & nchar(alt(x)) > 1L & substring(alt(x), 1, 1) == ref(x) } isIndel <- function(x) { isDeletion(x) | isInsertion(x) } isSNV <- function(x) { nchar(alt(x)) == 1L & nchar(ref(x)) == 1L }
Valerie On 03/19/2014 01:07 PM, Vincent Carey wrote:
On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence <lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com>> wrote: It would be nice to have functions like isSNV, isIndel, isDeletion, etc that at least provide precise definitions of the terminology. I've added these, but they're designed only for VRanges. Should work for ExpandedVCF. Also, it would be nice if restrictToSNV just assumed that alt(x) must be something with nchar() support (with special handling for any List), so that the 'character' vector of alt,VRanges would work immediately. Basically restrictToSNV should just be x[isSNV(x)]. Is there even a use-case for the restrictToSNV abstraction if we did that? for VCF instance it would be x[isSNV(x),] and indeed I think that would be sufficient. i like the idea of having this family of predicates for variant classes to allow such selections Michael On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain <voben...@fhcrc.org <mailto:voben...@fhcrc.org>> wrote: Hi, I've added a restrictToSNV() function to VariantAnnotation (1.9.46). The return value is a subset VCF object containing SNVs only. The function operates on CollapsedVCF or ExapandedVCF and the alt(VCF) value must be nucleotides (i.e., no structural variants). A variant is considered a SNV if the nucleotide sequences in both ref(vcf) and alt(x) are of length 1. I have a question about how variants with multiple 'ALT' values should be handled. Should we consider row 4 a SNV? One 'ALT' is length 1, the other is not. ALT <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C")) REF <- DNAStringSet(c("G", c("AA"), "T", "G")) DataFrame(REF, ALT) DataFrame with 4 rows and 2 columns REF ALT <DNAStringSet> <DNAStringSetList> 1 G A 2 AA TT 3 T G,A 4 G TT,C Thanks. Valerie _________________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
-- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B155 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: voben...@fhcrc.org Phone: (206) 667-3158 Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel