Thanks for the feedback.

I'll look into nchar for XStringSetList.

I'm in favor of supporting isDeletion(), isInsertion(), isIndel() and isSNV() for the VCF classes and removing restrictToSNV(). I could add an argument 'all_alt' or 'all_alt_agreement' to be used with CollapsedVCF in the case where not all alternate alleles meet the criteria.

Here are the current definitions:

isDeletion <- function(x) {
  nchar(alt(x)) == 1L & nchar(ref(x)) > 1L & substring(ref(x), 1, 1) == alt(x)
}

isInsertion <- function(x) {
  nchar(ref(x)) == 1L & nchar(alt(x)) > 1L & substring(alt(x), 1, 1) == ref(x)
}

isIndel <- function(x) {
  isDeletion(x) | isInsertion(x)
}

isSNV <- function(x) {
  nchar(alt(x)) == 1L & nchar(ref(x)) == 1L
}


Valerie


On 03/19/2014 01:07 PM, Vincent Carey wrote:



On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence
<lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com>> wrote:

    It would be nice to have functions like isSNV, isIndel, isDeletion,
    etc that at least provide precise definitions of the terminology.
    I've added these, but they're designed only for VRanges. Should work
    for ExpandedVCF.

    Also, it would be nice if restrictToSNV just assumed that alt(x)
    must be something with nchar() support (with special handling for
    any List), so that the 'character' vector of alt,VRanges would work
    immediately. Basically restrictToSNV should just be x[isSNV(x)]. Is
    there even a use-case for the restrictToSNV abstraction if we did that?


for VCF instance it would be x[isSNV(x),] and indeed I think that would
be sufficient.  i like the idea of having this family of predicates for
variant classes to allow such selections

    Michael



    On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
    <voben...@fhcrc.org <mailto:voben...@fhcrc.org>> wrote:

        Hi,

        I've added a restrictToSNV() function to VariantAnnotation
        (1.9.46). The return value is a subset VCF object containing
        SNVs only. The function operates on CollapsedVCF or ExapandedVCF
        and the alt(VCF) value must be nucleotides (i.e., no structural
        variants).

        A variant is considered a SNV if the nucleotide sequences in
        both ref(vcf) and alt(x) are of length 1. I have a question
        about how variants with multiple 'ALT' values should be handled.

        Should we consider row 4 a SNV? One 'ALT' is length 1, the other
        is not.

        ALT <- DNAStringSetList("A", c("TT"), c("G", "A"), c("TT", "C"))
        REF <- DNAStringSet(c("G", c("AA"), "T", "G"))

                DataFrame(REF, ALT)

            DataFrame with 4 rows and 2 columns
                          REF                ALT
               <DNAStringSet> <DNAStringSetList>
            1              G                  A
            2             AA                 TT
            3              T                G,A
            4              G               TT,C



        Thanks.
        Valerie

        _________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        mailing list
        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>





--
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B155
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: voben...@fhcrc.org
Phone:  (206) 667-3158
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to