Re: [Bioc-devel] Combining Ordinary List of GRanges Optimisation

Michael Lawrence Mon, 07 Jan 2013 09:08:56 -0800

Would be interesting to do some profiling. Could be the merging of the
sequence info, or the rbind on the meta DataFrames (even with one column,
could be some overhead here).


Michael


On Mon, Jan 7, 2013 at 12:31 AM, Hahne, Florian
<florian.ha...@novartis.com>wrote:

> Hi Dario,
> the most efficient way to transform between list-like structures of
> GRanges objects and single GRanges is to use the GRangesList class in the
> first place. Not sure how you came up with your initial list, but assuming
> that blockRanges is already a GRangesList object, unlist(blockRanges) will
> give you a unified GRanges object in matters of seconds, even for gigantic
> GRangesLists. Constructs like do.call are notoriously slow if your list is
> very long, and the c method for the GRanges class needs to do quite a lot
> of work on each element (e.g., checking seqnames and elementMetadata)
> which is not necessary when dealing with GRangesList where certain
> assumptions are being made up front. Also my understanding of the
> GRangesList class is that it essentially is a GRanges object with a bunch
> of pointers attached to it to deal with the individual subsets.
>
> Hope that helps,
> Florian
> --
>
>
>
>
>
>
> On 1/7/13 3:00 AM, "Dario Strbenac" <d.strbe...@garvan.org.au> wrote:
>
> >Hello,
> >
> >For a not so large list of GRanges:
> >
> >> length(blockRanges)
> >[1] 4029
> >> class(blockRanges)
> >[1] "list"
> >
> >Which don't have an unreasonable number of elements in them:
> >
> >> summary(sapply(blockRanges, length))
> >   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> >      1     961   20710   55210   77680  759600
> >
> >Combining them takes 15 minutes:
> >
> >> system.time(allRanges <- do.call(c, blockRanges))
> >sessionInfo()
> >   user  system elapsed
> >935.770  23.657 961.952
> >
> >> head(blockRanges[[1]])
> >GRanges with 6 ranges and 1 metadata column:
> >      seqnames         ranges strand | conservation
> >         <Rle>      <IRanges>  <Rle> |    <numeric>
> >  [1]     chr1 [10918, 10918]      * |        0.064
> >  [2]     chr1 [10919, 10919]      * |        0.056
> >  [3]     chr1 [10920, 10920]      * |        0.064
> >  [4]     chr1 [10921, 10921]      * |        0.056
> >  [5]     chr1 [10922, 10922]      * |        0.064
> >  [6]     chr1 [10923, 10923]      * |        0.064
> >  ---
> >  seqlengths:
> >   chr1
> >     NA
> >
> >Could this code be faster ?
> >
> >> sessionInfo()
> >R version 2.15.2 (2012-10-26)
> >Platform: x86_64-pc-linux-gnu (64-bit)
> >
> >other attached packages:
> >[1] GenomicRanges_1.10.5 IRanges_1.16.4       BiocGenerics_0.4.0
> >
> >--------------------------------------
> >Dario Strbenac
> >PhD Student
> >University of Sydney
> >Camperdown NSW 2050
> >Australia
> >
> >_______________________________________________
> >Bioc-devel@r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Combining Ordinary List of GRanges Optimisation

Reply via email to