Hi Hervé Thanks for fixing this problem. It works fine now, however I did notice a drop in performance. For example reading in chromosome 1 can take about twice as long as previously (see below). Is this something that can be avoided?
Many thanks again Leonard -- GenomicAlignments 0.99.8 > param <- ScanBamParam(which = GRanges("1", IRanges(1, 249250621))) > > system.time(readGAlignmentPairs(file, param = param)) user system elapsed 161.282 7.812 169.442 > system.time(readGAlignmentPairs(file, param = param)) user system elapsed 91.614 7.256 99.065 > system.time(readGAlignmentPairs(file, param = param)) user system elapsed 89.285 7.461 96.940 > > sessionInfo() R Under development (unstable) (2013-12-03 r64376) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicAlignments_0.99.8 Rsamtools_1.15.15 Biostrings_2.31.5 [4] GenomicRanges_1.15.17 XVector_0.3.5 IRanges_1.21.17 [7] BiocGenerics_0.9.2 loaded via a namespace (and not attached): [1] bitops_1.0-6 stats4_3.1.0 zlibbioc_1.9.0 > GenomicAlignments 0.99.10 > param <- ScanBamParam(which = GRanges("1", IRanges(1, 249250621))) > > system.time(readGAlignmentPairs(file, param = param)) user system elapsed 265.624 8.812 274.990 > system.time(readGAlignmentPairs(file, param = param)) user system elapsed 249.724 7.177 257.399 > system.time(readGAlignmentPairs(file, param = param)) user system elapsed 247.476 6.648 254.621 > > sessionInfo() R Under development (unstable) (2013-12-03 r64376) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicAlignments_0.99.10 Rsamtools_1.15.15 [3] Biostrings_2.31.5 GenomicRanges_1.15.17 [5] XVector_0.3.5 IRanges_1.21.17 [7] BiocGenerics_0.9.2 loaded via a namespace (and not attached): [1] bitops_1.0-6 stats4_3.1.0 zlibbioc_1.9.0 > On Thu, Dec 19, 2013 at 11:52 AM, Hervé Pagès <hpa...@fhcrc.org> wrote: > Hi Leonard, > > This should be fixed in GenomicAlignments 0.99.10, which will become > available thru biocLite() in the next 24 hours or so. In the mean time > you can grab it directly from svn: > > > https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomicAlignments > > Please let me know if you still run into problems with this. > Thanks! > > > H. > > On 12/19/2013 10:41 AM, Leonard Goldstein wrote: >> >> Hi Hervé >> >> You probably spotted this already but it looks like the problem is >> introduced between GenomicAlignments revisions r84052 (0.99.8) and >> r84106 (0.99.9) >> >> Best wishes >> >> Leonard >> >> >> On Wed, Dec 18, 2013 at 5:25 PM, Leonard Goldstein <golds...@gene.com> >> wrote: >>> >>> Dear list, >>> >>> There seems to be a problem with the readGAlignmentPairs function: >>> >>> Querying genomic regions without any alignments using the which >>> argument results in an error -- see (1) below. Reading in a whole >>> chromosome takes indefinitely (or at least much longer than in >>> previous versions) -- see (2) below. I suspect these issues are not >>> specific to the BAM files I am working with but can provide test data >>> if required. >>> >>> Many thanks for your help. >>> >>> Leonard >>> >>> >>> -- >>> (1) Attempts to read an empty region results in an error. >>> >>>> gr <- GRanges("22", IRanges(1000000, 2000000)) >>>> >>>> param <- ScanBamParam(which = gr) >>>> >>>> readGAlignmentPairs(file, param = param) >>> >>> Error in `elementMetadata<-`(x, ..., value = value) : >>> replacement 'elementMetadata' value must be a DataTable object or NULL >>>> >>>> >>>> sessionInfo() >>> >>> R Under development (unstable) (2013-12-03 r64376) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15 Biostrings_2.31.5 >>> [4] GenomicRanges_1.15.17 XVector_0.3.5 IRanges_1.21.16 >>> [7] BiocGenerics_0.9.2 BiocInstaller_1.13.3 >>> >>> loaded via a namespace (and not attached): >>> [1] bitops_1.0-6 stats4_3.1.0 tools_3.1.0 zlibbioc_1.9.0 >>>> >>>> >>> >>> ... but works fine with previous version >>> >>>> gr <- GRanges("22", IRanges(1000000, 2000000)) >>>> >>>> param <- ScanBamParam(which = gr) >>>> >>>> readGAlignmentPairs(file, param = param) >>> >>> GAlignmentPairs with 0 alignment pairs and 0 metadata columns: >>> seqnames strand : ranges -- ranges >>> <Rle> <Rle> : <IRanges> -- <IRanges> >>> --- >>> seqlengths: >>> 1 2 3 ... GL000247.1 GL000248.1 GL000249.1 >>> 249250621 243199373 198022430 ... 36422 39786 38502 >>>> >>>> >>>> sessionInfo() >>> >>> R version 3.0.0 (2013-04-03) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4 >>> [4] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] bitops_1.0-6 stats4_3.0.0 zlibbioc_1.8.0 >>> >>> >>> (2) Use of the which argument covering chromosome 22 takes under one >>> minute with an earlier version of readGAlignmentPairs >>> >>>> gr <- GRanges("22", IRanges(1, 51304566)) >>>> >>>> param <- ScanBamParam(which = gr) >>>> >>>> system.time(gap <- readGAlignmentPairs(file, param = param)) >>> >>> user system elapsed >>> 45.887 0.256 46.168 >>>> >>>> >>>> sessionInfo() >>> >>> R version 3.0.0 (2013-04-03) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4 >>> [4] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] bitops_1.0-6 stats4_3.0.0 zlibbioc_1.8.0 >>>> >>>> >>> >>> ... but at least twenty times as long with the current version. >>> >>>> gr <- GRanges("22", IRanges(1, 51304566)) >>>> >>>> param <- ScanBamParam(which = gr) >>>> >>>> system.time(gap <- readGAlignmentPairs(file, param = param)) >>> >>> >>> ^C >>> Timing stopped at: 1108.041 35.998 1144.006 >>>> >>>> >>>> sessionInfo() >>> >>> R Under development (unstable) (2013-12-03 r64376) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15 Biostrings_2.31.5 >>> [4] GenomicRanges_1.15.17 XVector_0.3.5 IRanges_1.21.16 >>> [7] BiocGenerics_0.9.2 >>> >>> loaded via a namespace (and not attached): >>> [1] bitops_1.0-6 stats4_3.1.0 zlibbioc_1.9.0 >>>> >>>> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel