Update on these tasks.

1) XStringSetList now has an nchar() method (as of Biostrings 2.31.17)

2) restrictToSNV() was removed from VariantAnnotation

3) The following generics and methods for VCF and VRanges have been added to VariantAnnotation 1.9.50:

isSNV()
isInsertion()
isDeletion()
isIndel()
isSubstitution
isTranstion()

I've held off on adding

isSV()
isSVPrecise()

until we have a way to distinguish structural vs non-structual ALT. Currently if any of the ALT values are structural, all are coerced to character. It would be good to have a way to distinguish a mixture of ALT values so we can compute on the nucleotides and do whatever else on the structural variants. This may be a project for the next dev cycle.

Valerie


On 03/19/2014 03:29 PM, Michael Lawrence wrote:
Thanks Sean. Probably also need an "isSubstitution" for any
substitution, either SNV or complex.


On Wed, Mar 19, 2014 at 3:20 PM, Sean Davis <sdav...@mail.nih.gov
<mailto:sdav...@mail.nih.gov>> wrote:



    On Wed, Mar 19, 2014 at 4:26 PM, Valerie Obenchain
    <voben...@fhcrc.org <mailto:voben...@fhcrc.org>> wrote:

        Thanks for the feedback.

        I'll look into nchar for XStringSetList.

        I'm in favor of supporting isDeletion(), isInsertion(),
        isIndel() and isSNV() for the VCF classes and removing
        restrictToSNV(). I could add an argument 'all_alt' or
        'all_alt_agreement' to be used with CollapsedVCF in the case
        where not all alternate alleles meet the criteria.

        Here are the current definitions:

            isDeletion <- function(x) {
               nchar(alt(x)) == 1L & nchar(ref(x)) > 1L &
            substring(ref(x), 1, 1) == alt(x)
            }

            isInsertion <- function(x) {
               nchar(ref(x)) == 1L & nchar(alt(x)) > 1L &
            substring(alt(x), 1, 1) == ref(x)
            }

            isIndel <- function(x) {
               isDeletion(x) | isInsertion(x)
            }

            isSNV <- function(x) {
               nchar(alt(x)) == 1L & nchar(ref(x)) == 1L
            }



    To be thorough:

    isTransition()

    isSV()

    isSVPrecise()

    We haven't been using VCF for SVs much yet, but there are probably
    some fun things to be done on that front.

    Sean



        Valerie



        On 03/19/2014 01:07 PM, Vincent Carey wrote:




            On Wed, Mar 19, 2014 at 4:00 PM, Michael Lawrence
            <lawrence.mich...@gene.com
            <mailto:lawrence.mich...@gene.com>
            <mailto:lawrence.michael@gene.__com
            <mailto:lawrence.mich...@gene.com>>> wrote:

                 It would be nice to have functions like isSNV, isIndel,
            isDeletion,
                 etc that at least provide precise definitions of the
            terminology.
                 I've added these, but they're designed only for
            VRanges. Should work
                 for ExpandedVCF.

                 Also, it would be nice if restrictToSNV just assumed
            that alt(x)
                 must be something with nchar() support (with special
            handling for
                 any List), so that the 'character' vector of
            alt,VRanges would work
                 immediately. Basically restrictToSNV should just be
            x[isSNV(x)]. Is
                 there even a use-case for the restrictToSNV abstraction
            if we did that?


            for VCF instance it would be x[isSNV(x),] and indeed I think
            that would
            be sufficient.  i like the idea of having this family of
            predicates for
            variant classes to allow such selections

                 Michael



                 On Tue, Mar 18, 2014 at 10:36 AM, Valerie Obenchain
                 <voben...@fhcrc.org <mailto:voben...@fhcrc.org>
            <mailto:voben...@fhcrc.org <mailto:voben...@fhcrc.org>>> wrote:

                     Hi,

                     I've added a restrictToSNV() function to
            VariantAnnotation
                     (1.9.46). The return value is a subset VCF object
            containing
                     SNVs only. The function operates on CollapsedVCF or
            ExapandedVCF
                     and the alt(VCF) value must be nucleotides (i.e.,
            no structural
                     variants).

                     A variant is considered a SNV if the nucleotide
            sequences in
                     both ref(vcf) and alt(x) are of length 1. I have a
            question
                     about how variants with multiple 'ALT' values
            should be handled.

                     Should we consider row 4 a SNV? One 'ALT' is length
            1, the other
                     is not.

                     ALT <- DNAStringSetList("A", c("TT"), c("G", "A"),
            c("TT", "C"))
                     REF <- DNAStringSet(c("G", c("AA"), "T", "G"))

                             DataFrame(REF, ALT)

                         DataFrame with 4 rows and 2 columns
                                       REF                ALT
                            <DNAStringSet> <DNAStringSetList>
                         1              G                  A
                         2             AA                 TT
                         3              T                G,A
                         4              G               TT,C



                     Thanks.
                     Valerie

                     ___________________________________________________
            Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
            <mailto:Bioc-devel@r-project.__org
            <mailto:Bioc-devel@r-project.org>>
                     mailing list
            https://stat.ethz.ch/mailman/____listinfo/bioc-devel
            <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
                     <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
            <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>





        --
        Valerie Obenchain
        Program in Computational Biology
        Fred Hutchinson Cancer Research Center
        1100 Fairview Ave. N, M1-B155
        P.O. Box 19024
        Seattle, WA 98109-1024

        E-mail: voben...@fhcrc.org <mailto:voben...@fhcrc.org>
        Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
        Fax: (206) 667-1319 <tel:%28206%29%20667-1319>


        _________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        mailing list
        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>




_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to