Hi Thomas,

In some particular situations seqselect<- was using some tricks
to be fast. In IRanges 1.20.6, I've ported these same tricks to [<-
so the performance regression you report below should be gone.
Let me know if you run into other issues with the subsetting code.

Thanks,
H.


On 11/11/2013 05:06 PM, Thomas Sandmann wrote:
Hi Herve,

thanks a lot for re-enabling the subsetting functionality for
CompressedRleList with List-like objects.
While things work now, I noticed a big difference in execution time for
the following operations:

with IRanges_1.18.2

rles <- RleList(Rle(values=TRUE,__lengths=10000),
                 Rle(values=TRUE,lengths= 10000),
                 Rle(values=TRUE,lengths= 10000),
                 Rle(values=TRUE,__lengths=10000),
                 Rle(values=TRUE,__lengths=10000),
                 Rle(values=TRUE,__lengths=10000),
                 Rle(values=TRUE,__lengths=10000),
                 Rle(values=TRUE,__lengths=10000),
                 compress=TRUE)

system.time(seqselect( rles, unname(list(a=20:108, b=41:131, c=21:105,
d=1:1234,
                    e=4:5, f=1223:1243, g=432:5234, h=444:5555) )) <- TRUE)

clocks ca. *0.040s *on my system.

R 3.0.2 with other attached packages:
  [1] Rsamtools_1.12.2     Biostrings_2.28.0       devtools_1.3
  [4] GenomicRanges_1.12.4 IRanges_1.18.2       BiocGenerics_0.6.0
  [7] Defaults_1.1-1       BiocInstaller_1.10.3 roxygen2_2.2.2
[10] digest_0.6.3

with IRanges_1.20.5, the same operation is much slower:

system.time( rles[ unname( list(a=20:108, b=41:131, c=21:105, d=1:1234,
                     e=4:5, f=1223:1243, g=432:5234, h=444:5555)) ] <-
TRUE )

takes about *0.45s * more than 10x longer.**

R3.0.0 with other attached packages:
  [1] devtools_1.3    rtracklayer_1.22.0   Rsamtools_1.14.1
  [4] Biostrings_2.30.0    GenomicRanges_1.14.3 XVector_0.2.0
  [7] IRanges_1.20.5       BiocGenerics_0.8.0   Defaults_1.1-1
[10] BiocInstaller_1.12.0 roxygen2_2.2.2       digest_0.6.3
I noticed even larger speed degradation with real-life, longer datasets,
so the decrease appears to be non-linear.

Can you reproduce this difference in performance ?
If so, would it be possible to reinstate the old seqselect method for
the sake of efficiency ?

Thomas

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to