There is a lot going on with respect to the view* stuff, and yes, it's not just about Rle's but they also need to work on atomic vectors.
Right now min(), max(), sum(), and mean() all work on IntegerList,
NumericList, RleList, XIntegerViews and XDoubleViews but
implementations are disparate and share almost nothing. Comparing
for example the sum,XIntegerViews, sum,CompressedIntegerList, and
sum,SimpleIntegerList methods:

  library(XVector)
  set.seed(33)
  subject <- sample(50, 5400200, replace=TRUE)

  ## XIntegerViews:
xiv <- successiveViews(as(subject, "XInteger"), width=rep(200, 5400200/200))

  ## CompressedIntegerList:
  cil <- extractList(subject, ranges(xiv))

  ## SimpleIntegerList:
sil <- IntegerList(unname(split(subject, togroup(ranges(xiv)))), compress=FALSE)

Then:

  > system.time(res1 <- sum(xiv))
     user  system elapsed
    0.008   0.000   0.008

  > system.time(res2 <- sum(cil))
     user  system elapsed
    0.488   0.004   0.492

  > system.time(res3 <- sum(sil))
     user  system elapsed
    0.036   0.000   0.034

The 3 methods share zero code. sum,XIntegerViews is implemented in
C while sum,CompressedIntegerList and sum,SimpleIntegerList are
implemented in R. Just an example.

All this need to be revisited. This is actually one of my goals for
BioC 3.0. viewMedians() on RleViews is just the tip of the iceberg.

H.

On 06/02/2014 01:24 PM, Michael Lawrence wrote:
While we rework things, what about adding support for atomic vectors, in
addition to Rles? Also, what about functions that are optimized for
partitionings? Those would be easy to write and would let us greatly
accelerate e.g. sum,CompressedIntegerList. Right now we rely on rowsum()
which is fast but could be much faster.

Michael



On Mon, Jun 2, 2014 at 10:48 AM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:

    Hi Peter,

    Seems like you have a pretty good implementation of the view* functions
    in genoset. Nice work! And great to hear that there is so much room for
    improvements to the implementation currently in IRanges. I'll try to
    give this a shot soon but first I want to move Rle's to the S4Vectors
    package.

    Cheers,
    H.



    On 06/01/2014 07:58 PM, Peter Haverty wrote:

        I think viewMedians would be great.  While you have the hood up,
        there are
        some opportunities for some speedups and code simplification, I
        believe.

        I did some experimentation with view* in the genoset package. I
        made an
        alternate version of the C for viewMeans and found about a 10X
        speedup.  I
        hoisted the branching for the different types and did the NA
        handling with
        arithmetic rather than branching. The search for the Rle runs
        covered by
        each view is now done with findInterval.  There are quite a few code
        sections that differ only in the type of the NA value and the
        pointers to
        the input/output vectors. I think it would be worth considering C++
        templates.

        On the R side, each view* function is pretty similar too. In
        genoset/R/RleDataFrame-views.R I tried to factor out all the
        shared pieces.

        While we're on the topic, I think the view* functions should
        have range*
        equivalents that skip the View object and work on an Rle and an
        IRanges.
           If you already have a Views object around, view* are perfect.
        Otherwise,
        making the Views objects uses time that could be saved.

        Overall I found about a 90X speedup over viewMeans(RleViewsList).

        I hope there is some useful food for thought in these
        experiments. I have a
        vignette that shows some of the timings if anyone is interested.

        Regards,
        Pete

        ____________________
        Peter M. Haverty, Ph.D.
        Genentech, Inc.
        phave...@gene.com <mailto:phave...@gene.com>

                 [[alternative HTML version deleted]]

        _________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        mailing list
        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>


    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>


    _________________________________________________
    Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
    https://stat.ethz.ch/mailman/__listinfo/bioc-devel
    <https://stat.ethz.ch/mailman/listinfo/bioc-devel>



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to