Hi Michael, On 12/13/2013 06:39 PM, Michael Lawrence wrote:
Coercion might suffice. I do remember Patrick optimizing these selections with e.g. memcpy(), so they are pretty fast.
The memcpy() trick was used (and is still used in extractROWS) when seqselect'ing by a Ranges object. For subsetting *by* an integer-Rle, there was no (and there is still no) optimization: the subscript was just passed thru as.integer() internally. Subsetting by a numeric-Rle or character-Rle was broken.
No profiling data though. I do have some performance critical code that has relied on the Rle-based extraction. Would be nice to avoid re-evaluating the performance.
From a performance point of view, there should be no significant difference between doing x[as.vector(i)] and doing IRanges:::extractROWS(x, i) when 'i' is an Rle, because the latter passes 'i' thru as.vector() internally (internal helper normalizeSingleBracketSubscript actually does that). However I would still recommend you use the latter in your package so it will take advantage of optimizations that might happen in the future. H.
On Fri, Dec 13, 2013 at 6:19 PM, Hervé Pagès <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> wrote: On 12/13/2013 01:49 PM, Michael Lawrence wrote: Thanks, makes sense. Didn't realize we could dispatch on the 'i' parameter. I sort of recall the perception that we couldn't, and that was one of the main motivations behind seqselect. But it does appear possible. Well I was hoping I could do this but it doesn't work :-/ Found in the man page for `[`: S4 methods: These operators are also implicit S4 generics, but as primitives, S4 methods will be dispatched only on S4 objects ‘x’. OK, fair enough. But the following is really misleading: > library(IRanges) > `[` .Primitive("[") > getGeneric("[") standardGeneric for "[" defined from package "base" function (x, i, j, ..., drop = TRUE) standardGeneric("[", .Primitive("[")) <bytecode: 0x168cba0> <environment: 0x1ccfd90> Methods may be defined for arguments: x, i, j, drop Use showMethods("[") for currently available ones. So the implicit generic actually does dispatch on 'i'. I can see my new [,vector,Ranges method: > selectMethod("[", c("vector", "Ranges")) Method Definition: function (x, i, j, ..., drop = TRUE) { if (!missing(j) || length(list(...)) > 0L) stop("invalid subsetting") extractROWS(x, i) } <environment: namespace:IRanges> Signatures: x i target "vector" "Ranges" defined "vector" "Ranges" And dispatch works if I explicitly call the generic: > getGeneric("[")(letters, IRanges(4, 8)) [1] "d" "e" "f" "g" "h" but not if I call the primitive: > letters[IRanges(4, 8)] Error in letters[IRanges(4, 8)] : invalid subscript type 'S4' Seems like the primitive first checks 'x' and only if it's an S4 object it then delegates to the implicit S4 generic. Probably for performance reasons as it avoids the cost of having to perform full multiple dispatch when 'x' is an ordinary objects. The following hack works: > `[` <- getGeneric("[") > letters[IRanges(4, 8)] [1] "d" "e" "f" "g" "h" but putting this in IRanges feels wrong (I tried and it caused troubles with ref classes). So I guess I should go ahead and export/document extractROWS() and replaceROWS(). What are the other options? In the mean time of course you can always pass your Ranges or Rle subscript thru unlist() or as.vector() first (not much more typing than doing seqselect() and I don't expect this will impact performance too much in practise). H. Michael On Fri, Dec 13, 2013 at 1:10 PM, Hervé Pagès <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote: Hi Michael, On 12/13/2013 01:03 PM, Michael Lawrence wrote: I used to use seqselect for subsetting ordinary R vectors by Ranges and Rle. IRanges:::extractROWS does this, but it's hidden behind the namespace. What is the public way of doing this? Maybe we just need to export extractROWS()? Or something with a better name? I'll add [,vector,Ranges and [,vector,Rle methods (and probably also [,factor,Ranges and [,factor,Rle). They'll just be wrappers to IRanges:::extractROWS which I'd like to keep hidden. Was not sure people where doing this on ordinary R vectors so was waiting for someone to speak up. H. Michael [[alternative HTML version deleted]] ___________________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> <mailto:Bioc-devel@r-project.__org <mailto:Bioc-devel@r-project.org>> mailing list https://stat.ethz.ch/mailman/____listinfo/bioc-devel <https://stat.ethz.ch/mailman/__listinfo/bioc-devel> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
-- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel