Hi Charles,

On 02/08/2018 08:03 PM, Charles Plessy wrote:
Hello,

I have just discovered the GPos class, and I would like to use it in
my "CAGEr" package, where for the moment I store single-nucleotide
positions of transcription start sites in GRanges of width 1.

But a simple microbenchmark sugests that, although GPos are more
memory-efficient, they also may be more CPU-hungry, at least
with the "range" function.

Is there a way to optimise, or is it better to stay with
GRanges of width 1 when memory is not an issue ?

gpos1 <- GPos(c("chr1:44-53", "chr1:5-10", "chr2:2-5"))

granges1 <- GRanges(gpos1)

microbenchmark::microbenchmark(range(granges1), range(gpos1))
Unit: milliseconds
             expr      min       lq    mean   median       uq      max neval cld
  range(granges1) 21.42761 21.97009 24.1627 22.24532 22.92655 179.9715   100  a
     range(gpos1) 30.11515 30.84472 32.8824 31.36639 32.19281 104.3027   100   b

Timing such small objects is not really meaningful.

GPos objects are optimized to perform well when they contain long runs
of consecutive positions. For example:

gpos2 <- GPos(GRanges("chr1", successiveIRanges(rep(990, 2000), gapwidth=10)))
  gr2 <- as(gpos2, "GRanges")

  microbenchmark(range(gpos2), range(gr2))
  # Unit: milliseconds
# expr min lq mean median uq max neval cld # range(gpos2) 102.4948 111.9229 137.5418 116.0058 134.2129 239.0805 100 a # range(gr2) 111.3651 118.2075 154.2758 133.3702 211.2164 232.4975 100 b

  microbenchmark(coverage(gpos2), coverage(gr2))
  # Unit: milliseconds
# expr min lq mean median uq max neval # coverage(gpos2) 98.09502 106.3827 143.7039 111.9778 138.1875 304.8126 100 # coverage(gr2) 152.82492 168.9123 204.8362 175.1129 189.7343 363.9795 100
 cld
  a
   b

so not a big difference but a small advantage for GPos.

However, a big advantage for GPos in terms of memory footprint:

  object.size(gpos2)
  # 26520 bytes
  object.size(gr2)
  # 15849120 bytes

Anyway, if memory is not an issue, then it won't make much difference
whether you use GRanges or GPos.

Cheers,
H.



sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0

locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8
  [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    
LC_MESSAGES=en_GB.UTF-8
  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  
methods   base

other attached packages:
[1] GenomicRanges_1.31.16 GenomeInfoDb_1.15.5   IRanges_2.13.22       
S4Vectors_0.17.30
[5] BiocGenerics_0.25.2

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.14            XVector_0.19.8          MASS_7.3-47             
splines_3.4.3
  [5] zlibbioc_1.24.0         munsell_0.4.3           lattice_0.20-35         
colorspace_1.3-2
  [9] rlang_0.1.4             multcomp_1.4-8          plyr_1.8.4              
tools_3.4.3
[13] grid_3.4.3              gtable_0.2.0            TH.data_1.0-8           
survival_2.41-3
[17] yaml_2.1.15             lazyeval_0.2.1          tibble_1.3.4            
Matrix_1.2-12
[21] GenomeInfoDbData_0.99.1 ggplot2_2.2.1           codetools_0.2-15        
microbenchmark_1.4-2.1
[25] bitops_1.0-6            RCurl_1.95-4.10         sandwich_2.4-0          
compiler_3.4.3
[29] scales_0.5.0            mvtnorm_1.0-6           zoo_1.8-0

(I have also made a benchmark on "real" data, which confirmed the test above)

Have a nice day,

Charles


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to