Hi, GenomicAlignments::readGAlignmentPairs() can take a while to (correctly) fail if the `which` parameter contains a "bad" seqlevel. It'd be great if it failed early in the following scenario (just experienced).
An example BAM is available from https://www.dropbox.com/sh/4avqxuqnhlv3r9c/AADqx-XqXV6c7Dc_SaSUq324a?dl=0 (110 MB; sorry, it needs to be large-ish in order to notice the problem). The following code ought to reproduce the problem. Here I am taking the example BAM of mouse data mapped to mm10 and using a `which` based on hg19 (it was mistakenly assuming all my data were human that led me to this problem). When a single "bad" seqlevel is provided via `which` then it errors fast and with a helpful error. However, if the `which` contains multiple seqlevels, some "good" and some "bad", then it seemingly just hangs. I initially thought it had just frozen indefinitely but it actually eventually returns the correct error. It'd be great if it failed fast in this situation. Thanks, Pete library(GenomicAlignments) library(BSgenome.Hsapiens.UCSC.hg19) si <- seqinfo(BSgenome.Hsapiens.UCSC.hg19) # mouse data mapped to mm10 (temporarily available from https://www.dropbox.com/sh/4avqxuqnhlv3r9c/AADqx-XqXV6c7Dc_SaSUq324a?dl=0) file <- "~/Desktop/tmp/SRR1781315.markdup.bam" # Errors fast and helpfully because chr20 doesn't exist in mouse readGAlignmentPairs(file, param = ScanBamParam(which = as(si, "GRanges")[20])) # Takes a long time to error if some seqlevels exist (chr19) and some don't exist (chr20) in sample readGAlignmentPairs(file, param = ScanBamParam(which = as(si, "GRanges")[19:20])) > sessionInfo() R Under development (unstable) (2016-03-11 r70310) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.3 (El Capitan) locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.39.4 rtracklayer_1.31.7 [4] GenomicAlignments_1.7.20 Rsamtools_1.23.3 Biostrings_2.39.12 [7] XVector_0.11.7 SummarizedExperiment_1.1.22 Biobase_2.31.3 [10] GenomicRanges_1.23.24 GenomeInfoDb_1.7.6 IRanges_2.5.40 [13] S4Vectors_0.9.42 BiocGenerics_0.17.3 repete_0.0.0.9000 [16] devtools_1.10.0 loaded via a namespace (and not attached): [1] XML_3.98-1.4 digest_0.6.9 bitops_1.0-6 zlibbioc_1.17.0 BiocParallel_1.5.20 [6] tools_3.3.0 RCurl_1.95-4.8 memoise_1.0.0 _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
