Re: [Bioc-devel] Combining Ordinary List of GRanges Optimisation

Michael Lawrence Mon, 07 Jan 2013 10:15:12 -0800

The merging of the sequence information is essentially a Reduce(). The rest
of the components are merged N-way.


Michael

On Mon, Jan 7, 2013 at 9:18 AM, Ryan C. Thompson <r...@thompsonclan.org>wrote:

> With such a huge difference, I would wonder if the "c" method for GRanges
> objects is doing N-1 pairwise merges instead of a single N-way merge.
>
>
> On Mon 07 Jan 2013 09:08:28 AM PST, Michael Lawrence wrote:
>
>> Would be interesting to do some profiling. Could be the merging of the
>> sequence info, or the rbind on the meta DataFrames (even with one column,
>> could be some overhead here).
>>
>> Michael
>>
>>
>> On Mon, Jan 7, 2013 at 12:31 AM, Hahne, Florian
>> <florian.ha...@novartis.com>**wrote:
>>
>>  Hi Dario,
>>> the most efficient way to transform between list-like structures of
>>> GRanges objects and single GRanges is to use the GRangesList class in the
>>> first place. Not sure how you came up with your initial list, but
>>> assuming
>>> that blockRanges is already a GRangesList object, unlist(blockRanges)
>>> will
>>> give you a unified GRanges object in matters of seconds, even for
>>> gigantic
>>> GRangesLists. Constructs like do.call are notoriously slow if your list
>>> is
>>> very long, and the c method for the GRanges class needs to do quite a lot
>>> of work on each element (e.g., checking seqnames and elementMetadata)
>>> which is not necessary when dealing with GRangesList where certain
>>> assumptions are being made up front. Also my understanding of the
>>> GRangesList class is that it essentially is a GRanges object with a bunch
>>> of pointers attached to it to deal with the individual subsets.
>>>
>>> Hope that helps,
>>> Florian
>>> --
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 1/7/13 3:00 AM, "Dario Strbenac" <d.strbe...@garvan.org.au> wrote:
>>>
>>>  Hello,
>>>>
>>>> For a not so large list of GRanges:
>>>>
>>>>  length(blockRanges)
>>>>>
>>>> [1] 4029
>>>>
>>>>> class(blockRanges)
>>>>>
>>>> [1] "list"
>>>>
>>>> Which don't have an unreasonable number of elements in them:
>>>>
>>>>  summary(sapply(blockRanges, length))
>>>>>
>>>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>>       1     961   20710   55210   77680  759600
>>>>
>>>> Combining them takes 15 minutes:
>>>>
>>>>  system.time(allRanges <- do.call(c, blockRanges))
>>>>>
>>>> sessionInfo()
>>>>    user  system elapsed
>>>> 935.770  23.657 961.952
>>>>
>>>>  head(blockRanges[[1]])
>>>>>
>>>> GRanges with 6 ranges and 1 metadata column:
>>>>       seqnames         ranges strand | conservation
>>>>          <Rle>      <IRanges>  <Rle> |    <numeric>
>>>>   [1]     chr1 [10918, 10918]      * |        0.064
>>>>   [2]     chr1 [10919, 10919]      * |        0.056
>>>>   [3]     chr1 [10920, 10920]      * |        0.064
>>>>   [4]     chr1 [10921, 10921]      * |        0.056
>>>>   [5]     chr1 [10922, 10922]      * |        0.064
>>>>   [6]     chr1 [10923, 10923]      * |        0.064
>>>>   ---
>>>>   seqlengths:
>>>>    chr1
>>>>      NA
>>>>
>>>> Could this code be faster ?
>>>>
>>>>  sessionInfo()
>>>>>
>>>> R version 2.15.2 (2012-10-26)
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>
>>>> other attached packages:
>>>> [1] GenomicRanges_1.10.5 IRanges_1.16.4       BiocGenerics_0.4.0
>>>>
>>>> ------------------------------**--------
>>>> Dario Strbenac
>>>> PhD Student
>>>> University of Sydney
>>>> Camperdown NSW 2050
>>>> Australia
>>>>
>>>> ______________________________**_________________
>>>> Bioc-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>>
>>>
>>> ______________________________**_________________
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>>
>> ______________________________**_________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Combining Ordinary List of GRanges Optimisation

Reply via email to