The merging of the sequence information is essentially a Reduce(). The rest of the components are merged N-way.
Michael On Mon, Jan 7, 2013 at 9:18 AM, Ryan C. Thompson <r...@thompsonclan.org>wrote: > With such a huge difference, I would wonder if the "c" method for GRanges > objects is doing N-1 pairwise merges instead of a single N-way merge. > > > On Mon 07 Jan 2013 09:08:28 AM PST, Michael Lawrence wrote: > >> Would be interesting to do some profiling. Could be the merging of the >> sequence info, or the rbind on the meta DataFrames (even with one column, >> could be some overhead here). >> >> Michael >> >> >> On Mon, Jan 7, 2013 at 12:31 AM, Hahne, Florian >> <florian.ha...@novartis.com>**wrote: >> >> Hi Dario, >>> the most efficient way to transform between list-like structures of >>> GRanges objects and single GRanges is to use the GRangesList class in the >>> first place. Not sure how you came up with your initial list, but >>> assuming >>> that blockRanges is already a GRangesList object, unlist(blockRanges) >>> will >>> give you a unified GRanges object in matters of seconds, even for >>> gigantic >>> GRangesLists. Constructs like do.call are notoriously slow if your list >>> is >>> very long, and the c method for the GRanges class needs to do quite a lot >>> of work on each element (e.g., checking seqnames and elementMetadata) >>> which is not necessary when dealing with GRangesList where certain >>> assumptions are being made up front. Also my understanding of the >>> GRangesList class is that it essentially is a GRanges object with a bunch >>> of pointers attached to it to deal with the individual subsets. >>> >>> Hope that helps, >>> Florian >>> -- >>> >>> >>> >>> >>> >>> >>> On 1/7/13 3:00 AM, "Dario Strbenac" <d.strbe...@garvan.org.au> wrote: >>> >>> Hello, >>>> >>>> For a not so large list of GRanges: >>>> >>>> length(blockRanges) >>>>> >>>> [1] 4029 >>>> >>>>> class(blockRanges) >>>>> >>>> [1] "list" >>>> >>>> Which don't have an unreasonable number of elements in them: >>>> >>>> summary(sapply(blockRanges, length)) >>>>> >>>> Min. 1st Qu. Median Mean 3rd Qu. Max. >>>> 1 961 20710 55210 77680 759600 >>>> >>>> Combining them takes 15 minutes: >>>> >>>> system.time(allRanges <- do.call(c, blockRanges)) >>>>> >>>> sessionInfo() >>>> user system elapsed >>>> 935.770 23.657 961.952 >>>> >>>> head(blockRanges[[1]]) >>>>> >>>> GRanges with 6 ranges and 1 metadata column: >>>> seqnames ranges strand | conservation >>>> <Rle> <IRanges> <Rle> | <numeric> >>>> [1] chr1 [10918, 10918] * | 0.064 >>>> [2] chr1 [10919, 10919] * | 0.056 >>>> [3] chr1 [10920, 10920] * | 0.064 >>>> [4] chr1 [10921, 10921] * | 0.056 >>>> [5] chr1 [10922, 10922] * | 0.064 >>>> [6] chr1 [10923, 10923] * | 0.064 >>>> --- >>>> seqlengths: >>>> chr1 >>>> NA >>>> >>>> Could this code be faster ? >>>> >>>> sessionInfo() >>>>> >>>> R version 2.15.2 (2012-10-26) >>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>> >>>> other attached packages: >>>> [1] GenomicRanges_1.10.5 IRanges_1.16.4 BiocGenerics_0.4.0 >>>> >>>> ------------------------------**-------- >>>> Dario Strbenac >>>> PhD Student >>>> University of Sydney >>>> Camperdown NSW 2050 >>>> Australia >>>> >>>> ______________________________**_________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >>>> >>> >>> ______________________________**_________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >>> >>> >> [[alternative HTML version deleted]] >> >> >> ______________________________**_________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel