Hi Michael,

On 12/13/2013 06:39 PM, Michael Lawrence wrote:
Coercion might suffice. I do remember Patrick optimizing these
selections with e.g. memcpy(), so they are pretty fast.

The memcpy() trick was used (and is still used in extractROWS) when
seqselect'ing by a Ranges object. For subsetting *by* an integer-Rle,
there was no (and there is still no) optimization: the subscript was
just passed thru as.integer() internally. Subsetting by a numeric-Rle
or character-Rle was broken.

No profiling
data though. I do have some performance critical code that has relied on
the Rle-based extraction. Would be nice to avoid re-evaluating the
performance.

From a performance point of view, there should be no significant
difference between doing

  x[as.vector(i)]

and doing

  IRanges:::extractROWS(x, i)

when 'i' is an Rle, because the latter passes 'i' thru as.vector()
internally (internal helper normalizeSingleBracketSubscript actually
does that). However I would still recommend you use the latter in
your package so it will take advantage of optimizations that might
happen in the future.

H.



On Fri, Dec 13, 2013 at 6:19 PM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:

    On 12/13/2013 01:49 PM, Michael Lawrence wrote:

        Thanks, makes sense. Didn't realize we could dispatch on the 'i'
        parameter. I sort of recall the perception that we couldn't, and
        that
        was one of the main motivations behind seqselect. But it does appear
        possible.


    Well I was hoping I could do this but it doesn't work :-/

    Found in the man page for `[`:

        S4 methods:

          These operators are also implicit S4 generics, but as primitives,
          S4 methods will be dispatched only on S4 objects ‘x’.

    OK, fair enough. But the following is really misleading:

       > library(IRanges)

       > `[`
       .Primitive("[")

       > getGeneric("[")
       standardGeneric for "[" defined from package "base"

       function (x, i, j, ..., drop = TRUE)
    standardGeneric("[", .Primitive("["))
       <bytecode: 0x168cba0>
       <environment: 0x1ccfd90>
       Methods may be defined for arguments: x, i, j, drop
       Use  showMethods("[")  for currently available ones.

    So the implicit generic actually does dispatch on 'i'.

    I can see my new [,vector,Ranges method:

       > selectMethod("[", c("vector", "Ranges"))
       Method Definition:

       function (x, i, j, ..., drop = TRUE)
       {
         if (!missing(j) || length(list(...)) > 0L)
             stop("invalid subsetting")
         extractROWS(x, i)
       }
       <environment: namespace:IRanges>

       Signatures:
               x        i
       target  "vector" "Ranges"
       defined "vector" "Ranges"

    And dispatch works if I explicitly call the generic:

       > getGeneric("[")(letters, IRanges(4, 8))
       [1] "d" "e" "f" "g" "h"

    but not if I call the primitive:

       > letters[IRanges(4, 8)]
       Error in letters[IRanges(4, 8)] : invalid subscript type 'S4'

    Seems like the primitive first checks 'x' and only if it's an
    S4 object it then delegates to the implicit S4 generic. Probably
    for performance reasons as it avoids the cost of having to perform
    full multiple dispatch when 'x' is an ordinary objects.

    The following hack works:

       > `[` <- getGeneric("[")
       > letters[IRanges(4, 8)]
       [1] "d" "e" "f" "g" "h"

    but putting this in IRanges feels wrong (I tried and it caused
    troubles with ref classes).

    So I guess I should go ahead and export/document extractROWS()
    and replaceROWS(). What are the other options?

    In the mean time of course you can always pass your Ranges or Rle
    subscript thru unlist() or as.vector() first (not much more typing
    than doing seqselect() and I don't expect this will impact performance
    too much in practise).

    H.


        Michael


        On Fri, Dec 13, 2013 at 1:10 PM, Hervé Pagès <hpa...@fhcrc.org
        <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote:

             Hi Michael,


             On 12/13/2013 01:03 PM, Michael Lawrence wrote:

                 I used to use seqselect for subsetting ordinary R
        vectors by
                 Ranges and
                 Rle. IRanges:::extractROWS does this, but it's hidden
        behind the
                 namespace.
                 What is the public way of doing this?

                 Maybe we just need to export extractROWS()? Or
        something with a
                 better name?


             I'll add [,vector,Ranges and [,vector,Rle methods (and
        probably also
             [,factor,Ranges and [,factor,Rle). They'll just be wrappers to
             IRanges:::extractROWS which I'd like to keep hidden.

             Was not sure people where doing this on ordinary R vectors
        so was
             waiting for someone to speak up.

             H.


                 Michael

                          [[alternative HTML version deleted]]

                 ___________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        <mailto:Bioc-devel@r-project.__org
        <mailto:Bioc-devel@r-project.org>>
                 mailing list
        https://stat.ethz.ch/mailman/____listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>

                 <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>


             --
             Hervé Pagès

             Program in Computational Biology
             Division of Public Health Sciences
             Fred Hutchinson Cancer Research Center
             1100 Fairview Ave. N, M1-B514
             P.O. Box 19024
             Seattle, WA 98109-1024

             E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
             Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
        <tel:%28206%29%20667-5791>
             Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
        <tel:%28206%29%20667-1319>



    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to