I think viewMedians would be great.  While you have the hood up, there are
some opportunities for some speedups and code simplification, I believe.

I did some experimentation with view* in the genoset package. I made an
alternate version of the C for viewMeans and found about a 10X speedup.  I
hoisted the branching for the different types and did the NA handling with
arithmetic rather than branching. The search for the Rle runs covered by
each view is now done with findInterval.  There are quite a few code
sections that differ only in the type of the NA value and the pointers to
the input/output vectors. I think it would be worth considering C++
templates.

On the R side, each view* function is pretty similar too. In
genoset/R/RleDataFrame-views.R I tried to factor out all the shared pieces.

While we're on the topic, I think the view* functions should have range*
equivalents that skip the View object and work on an Rle and an IRanges.
 If you already have a Views object around, view* are perfect. Otherwise,
making the Views objects uses time that could be saved.

Overall I found about a 90X speedup over viewMeans(RleViewsList).

I hope there is some useful food for thought in these experiments. I have a
vignette that shows some of the timings if anyone is interested.

Regards,
Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to