Dear list, There seems to be a problem with the readGAlignmentPairs function:
Querying genomic regions without any alignments using the which argument results in an error -- see (1) below. Reading in a whole chromosome takes indefinitely (or at least much longer than in previous versions) -- see (2) below. I suspect these issues are not specific to the BAM files I am working with but can provide test data if required. Many thanks for your help. Leonard -- (1) Attempts to read an empty region results in an error. > gr <- GRanges("22", IRanges(1000000, 2000000)) > > param <- ScanBamParam(which = gr) > > readGAlignmentPairs(file, param = param) Error in `elementMetadata<-`(x, ..., value = value) : replacement 'elementMetadata' value must be a DataTable object or NULL > > sessionInfo() R Under development (unstable) (2013-12-03 r64376) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15 Biostrings_2.31.5 [4] GenomicRanges_1.15.17 XVector_0.3.5 IRanges_1.21.16 [7] BiocGenerics_0.9.2 BiocInstaller_1.13.3 loaded via a namespace (and not attached): [1] bitops_1.0-6 stats4_3.1.0 tools_3.1.0 zlibbioc_1.9.0 > ... but works fine with previous version > gr <- GRanges("22", IRanges(1000000, 2000000)) > > param <- ScanBamParam(which = gr) > > readGAlignmentPairs(file, param = param) GAlignmentPairs with 0 alignment pairs and 0 metadata columns: seqnames strand : ranges -- ranges <Rle> <Rle> : <IRanges> -- <IRanges> --- seqlengths: 1 2 3 ... GL000247.1 GL000248.1 GL000249.1 249250621 243199373 198022430 ... 36422 39786 38502 > > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4 [4] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 stats4_3.0.0 zlibbioc_1.8.0 (2) Use of the which argument covering chromosome 22 takes under one minute with an earlier version of readGAlignmentPairs > gr <- GRanges("22", IRanges(1, 51304566)) > > param <- ScanBamParam(which = gr) > > system.time(gap <- readGAlignmentPairs(file, param = param)) user system elapsed 45.887 0.256 46.168 > > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsamtools_1.14.2 Biostrings_2.30.1 GenomicRanges_1.14.4 [4] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 stats4_3.0.0 zlibbioc_1.8.0 > ... but at least twenty times as long with the current version. > gr <- GRanges("22", IRanges(1, 51304566)) > > param <- ScanBamParam(which = gr) > > system.time(gap <- readGAlignmentPairs(file, param = param)) ^C Timing stopped at: 1108.041 35.998 1144.006 > > sessionInfo() R Under development (unstable) (2013-12-03 r64376) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicAlignments_0.99.9 Rsamtools_1.15.15 Biostrings_2.31.5 [4] GenomicRanges_1.15.17 XVector_0.3.5 IRanges_1.21.16 [7] BiocGenerics_0.9.2 loaded via a namespace (and not attached): [1] bitops_1.0-6 stats4_3.1.0 zlibbioc_1.9.0 > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel