With such a huge difference, I would wonder if the "c" method for
GRanges objects is doing N-1 pairwise merges instead of a single N-way
merge.
On Mon 07 Jan 2013 09:08:28 AM PST, Michael Lawrence wrote:
Would be interesting to do some profiling. Could be the merging of the
sequence info, or the rbind on the meta DataFrames (even with one column,
could be some overhead here).
Michael
On Mon, Jan 7, 2013 at 12:31 AM, Hahne, Florian
<florian.ha...@novartis.com>wrote:
Hi Dario,
the most efficient way to transform between list-like structures of
GRanges objects and single GRanges is to use the GRangesList class in the
first place. Not sure how you came up with your initial list, but assuming
that blockRanges is already a GRangesList object, unlist(blockRanges) will
give you a unified GRanges object in matters of seconds, even for gigantic
GRangesLists. Constructs like do.call are notoriously slow if your list is
very long, and the c method for the GRanges class needs to do quite a lot
of work on each element (e.g., checking seqnames and elementMetadata)
which is not necessary when dealing with GRangesList where certain
assumptions are being made up front. Also my understanding of the
GRangesList class is that it essentially is a GRanges object with a bunch
of pointers attached to it to deal with the individual subsets.
Hope that helps,
Florian
--
On 1/7/13 3:00 AM, "Dario Strbenac" <d.strbe...@garvan.org.au> wrote:
Hello,
For a not so large list of GRanges:
length(blockRanges)
[1] 4029
class(blockRanges)
[1] "list"
Which don't have an unreasonable number of elements in them:
summary(sapply(blockRanges, length))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 961 20710 55210 77680 759600
Combining them takes 15 minutes:
system.time(allRanges <- do.call(c, blockRanges))
sessionInfo()
user system elapsed
935.770 23.657 961.952
head(blockRanges[[1]])
GRanges with 6 ranges and 1 metadata column:
seqnames ranges strand | conservation
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [10918, 10918] * | 0.064
[2] chr1 [10919, 10919] * | 0.056
[3] chr1 [10920, 10920] * | 0.064
[4] chr1 [10921, 10921] * | 0.056
[5] chr1 [10922, 10922] * | 0.064
[6] chr1 [10923, 10923] * | 0.064
---
seqlengths:
chr1
NA
Could this code be faster ?
sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-pc-linux-gnu (64-bit)
other attached packages:
[1] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0
--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel