Update on these tasks.
1) XStringSetList now has an nchar() method (as of Biostrings 2.31.17)
2) restrictToSNV() was removed from VariantAnnotation
3) The following generics and methods for VCF and VRanges have been
added to VariantAnnotation 1.9.50:
isSNV()
isInsertion()
isDeletion()
isIndel()
isSubstitution
isTranstion()
I've held off on adding
isSV()
isSVPrecise()
until we have a way to distinguish structural vs non-structual ALT.
Currently if any of the ALT values are structural, all are coerced to
character. It would be good to have a way to distinguish a mixture of
ALT values so we can compute on the nucleotides and do whatever else on
the structural variants. This may be a project for the next dev cycle.
Valerie
On 03/19/2014 03:29 PM, Michael Lawrence wrote:
Thanks Sean. Probably also need an "isSubstitution" for any
substitution, either SNV or complex.
On Wed, Mar 19, 2014 at 3:20 PM, Sean Davis <sdav...@mail.nih.gov
<mailto:sdav...@mail.nih.gov>> wrote:
On Wed, Mar 19, 2014 at 4:26 PM, Valerie Obenchain
<voben...@fhcrc.org <mailto:voben...@fhcrc.org>> wrote:
Thanks for the feedback.
I'll look into nchar for XStringSetList.
I'm in favor of supporting isDeletion(), isInsertion(),
isIndel() and isSNV() for the VCF classes and removing
restrictToSNV(). I could add an argument 'all_alt' or
'all_alt_agreement' to be used with CollapsedVCF in the case
where not all alternate alleles meet the criteria.
Here are the current definitions:
isDeletion <- function(x) {
nchar(alt(x)) == 1L & nchar(ref(x)) > 1L &
substring(ref(x), 1, 1) == alt(x)
}
isInsertion <- function(x) {
nchar(ref(x)) == 1L & nchar(alt(x)) > 1L &
substring(alt(x), 1, 1) == ref(x)
}
isIndel <- function(x) {
isDeletion(x) | isInsertion(x)
}
isSNV <- function(x) {
nchar(alt(x)) == 1L & nchar(ref(x)) == 1L
}
To be thorough:
isTransition()
isSV()
isSVPrecise()
We haven't been using VCF for SVs much yet, but there are probably
some fun things to be done on that front.
Sean
Valerie
On 03/19/2014 01:07 PM, Vincent Carey wrote:
On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence
<lawrence.mich...@gene.com
<mailto:lawrence.mich...@gene.com>
<mailto:lawrence.michael@gene.__com
<mailto:lawrence.mich...@gene.com>>> wrote:
It would be nice to have functions like isSNV, isIndel,
isDeletion,
etc that at least provide precise definitions of the
terminology.
I've added these, but they're designed only for
VRanges. Should work
for ExpandedVCF.
Also, it would be nice if restrictToSNV just assumed
that alt(x)
must be something with nchar() support (with special
handling for
any List), so that the 'character' vector of
alt,VRanges would work
immediately. Basically restrictToSNV should just be
x[isSNV(x)]. Is
there even a use-case for the restrictToSNV abstraction
if we did that?
for VCF instance it would be x[isSNV(x),] and indeed I think
that would
be sufficient. i like the idea of having this family of
predicates for
variant classes to allow such selections
Michael
On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
<voben...@fhcrc.org <mailto:voben...@fhcrc.org>
<mailto:voben...@fhcrc.org <mailto:voben...@fhcrc.org>>> wrote:
Hi,
I've added a restrictToSNV() function to
VariantAnnotation
(1.9.46). The return value is a subset VCF object
containing
SNVs only. The function operates on CollapsedVCF or
ExapandedVCF
and the alt(VCF) value must be nucleotides (i.e.,
no structural
variants).
A variant is considered a SNV if the nucleotide
sequences in
both ref(vcf) and alt(x) are of length 1. I have a
question
about how variants with multiple 'ALT' values
should be handled.
Should we consider row 4 a SNV? One 'ALT' is length
1, the other
is not.
ALT <- DNAStringSetList("A", c("TT"), c("G", "A"),
c("TT", "C"))
REF <- DNAStringSet(c("G", c("AA"), "T", "G"))
DataFrame(REF, ALT)
DataFrame with 4 rows and 2 columns
REF ALT
<DNAStringSet> <DNAStringSetList>
1 G A
2 AA TT
3 T G,A
4 G TT,C
Thanks.
Valerie
___________________________________________________
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
<mailto:Bioc-devel@r-project.__org
<mailto:Bioc-devel@r-project.org>>
mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
--
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: voben...@fhcrc.org <mailto:voben...@fhcrc.org>
Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
_________________________________________________
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel