Hi Herve, Thanks a lot for the fix - it runs smoothly now on both release and devel.
Cheers, Michael On 06.05.2015 10:57, Hervé Pagès wrote: > Hi Michael, > > I finally got to fix this in Biostrings 2.36.1 (release) and 2.37.2 > (devel). Both should become available via biocLite() on Friday around > 11am (Seattle time). Thanks for your patience. > > Cheers, > H. > > > On 04/30/2015 10:50 AM, Hervé Pagès wrote: >> Hi Michael, >> >> Thanks for the reminder. I must confess this slipped out of my radar. >> I'll look into it today and will let you know. >> >> H. >> >> On 04/30/2015 08:42 AM, Michael Stadler wrote: >>> Hi Herve, >>> >>> I stumbled again over the 'memory not mapped' issue in trimLRpatterns >>> using updated versions of R and BioC-devel. I guess it does not hit >>> people very often, but I would highly appreciate if it could be fixed. >>> >>> Many thanks, >>> Michael >>> >>> PS: I can reproduce the issue using the code below under: >>> R version 3.2.0 (2015-04-16) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats4 parallel stats graphics grDevices utils datasets >>> [8] methods base >>> >>> other attached packages: >>> [1] ShortRead_1.27.1 GenomicAlignments_1.5.1 Rsamtools_1.21.3 >>> [4] GenomicRanges_1.21.5 GenomeInfoDb_1.5.1 BiocParallel_1.3.4 >>> [7] Biostrings_2.37.0 XVector_0.9.0 IRanges_2.3.6 >>> [10] S4Vectors_0.7.0 BiocGenerics_0.15.0 RColorBrewer_1.1-2 >>> >>> loaded via a namespace (and not attached): >>> [1] lattice_0.20-31 bitops_1.0-6 grid_3.2.0 >>> [4] futile.options_1.0.0 zlibbioc_1.15.0 hwriter_1.3.2 >>> [7] latticeExtra_0.6-26 futile.logger_1.4.1 lambda.r_1.1.7 >>> [10] Biobase_2.29.0 >>> >>> >>> On 25.04.2014 13:11, Hervé Pagès wrote: >>>> Hi Michael, >>>> >>>> Thanks for the report. I'll look into this. >>>> >>>> H. >>>> >>>> On 04/22/2014 08:29 AM, Michael Stadler wrote: >>>>> Dear Herve, >>>>> >>>>> We are hitting a 'memory not mapped' problem when using trimLRpatterns >>>>> as detailed below. I did not manage to reproduce it with few >>>>> sequences, >>>>> so I have to refer to a publicly available sequence file with many >>>>> reads. Even then, it occasionally runs through without problems. >>>>> >>>>> Also, our use-case may not be typical and be part of the problem - >>>>> maybe >>>>> the solution will be to change our use of trimLRpatterns. >>>>> >>>>> Here is some code to illustrate/reproduce the problem: >>>>> >>>>> library(Biostrings) >>>>> library(ShortRead) >>>>> >>>>> Rpat <- "TGGAATTCTCGGGTGCCAAGG" >>>>> maxRmm <- rep(0:2, c(6,3,nchar(Rpat)-9)) >>>>> >>>>> fq1 <- DNAStringSet(c("AAAATGGAATTCTCGGGTGCCAAGG", >>>>> "AAAATGGAATTCTCGGGTGCCAAGGTTTT")) >>>>> >>>>> # the second read is not trimmed because it runs through the adaptor >>>>> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) >>>>> # A DNAStringSet instance of length 2 >>>>> # width seq >>>>> #[1] 4 AAAA >>>>> #[2] 28 AAAATGGAATTCTCGGGTGCCAAGGTTT >>>>> >>>>> # as a workaround, we pad the adaptor with Ns and >>>>> # increase the mismatch tolerance >>>>> numNs <- 90 >>>>> maxRmm <- c(maxRmm, 1:numNs+max(maxRmm)) >>>>> Rpat <- paste(c(Rpat, rep("N", numNs)), collapse="") >>>>> >>>>> # now, also the second read gets trimmed >>>>> trimLRPatterns(subject=fq1, Rpattern=Rpat, max.Rmismatch=maxRmm) >>>>> # A DNAStringSet instance of length 2 >>>>> # width seq >>>>> #[1] 4 AAAA >>>>> #[2] 4 AAAA >>>>> >>>>> # to trigger the segmentation fault, many reads are needed >>>>> download.file("ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/ERR000916_1.fastq.gz", >>>>> >>>>> >>>>> >>>>> "ERR000916_1.fastq.gz") >>>>> fq2 <- readFastq("ERR000916_1.fastq.gz") >>>>> >>>>> fq3 <- trimLRPatterns(subject=fq2, Rpattern=Rpat, >>>>> max.Rmismatch=maxRmm) >>>>> # *** caught segfault *** >>>>> #address 0x7f5109be4fed, cause 'memory not mapped' >>>>> # >>>>> #Traceback: >>>>> # 1: .Call(.NAME, ..., PACKAGE = PACKAGE) >>>>> # 2: .Call2("XStringSet_vmatch_pattern_at", pattern, subject, at, >>>>> at.type, max.mismatch, min.mismatch, with.indels, fixed, ans.type, >>>>> auto.reduce.pattern, PACKAGE = "Biostrings") >>>>> # 3: .matchPatternAt(pattern, subject, ending.at, 1L, max.mismatch, >>>>> min.mismatch, with.indels, fixed, .to.ans.type(follow.index), >>>>> auto.reduce.pattern) >>>>> # 4: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, >>>>> ending.at = subject_width, max.mismatch = max.Rmismatch, >>>>> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = >>>>> TRUE) >>>>> # 5: which.isMatchingEndingAt(pattern = Rpattern, subject = subject, >>>>> ending.at = subject_width, max.mismatch = max.Rmismatch, >>>>> with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = >>>>> TRUE) >>>>> # 6: .computeTrimEnd(Rpattern, subject, max.Rmismatch, with.Rindels, >>>>> Rfixed) >>>>> # 7: .XStringSet.trimLRPatterns(Lpattern, Rpattern, subject, >>>>> max.Lmismatch, max.Rmismatch, with.Lindels, with.Rindels, Lfixed, >>>>> Rfixed, ranges) >>>>> # 8: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, >>>>> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, >>>>> ranges >>>>> = TRUE) >>>>> # 9: trimLRPatterns(Lpattern, Rpattern, sread(subject), max.Lmismatch, >>>>> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, >>>>> ranges >>>>> = TRUE) >>>>> #10: eval(expr, envir, enclos) >>>>> #11: eval(call, sys.frame(sys.parent())) >>>>> #12: callGeneric(Lpattern, Rpattern, sread(subject), max.Lmismatch, >>>>> max.Rmismatch, with.Lindels, with.Rindels, Lfixed, Rfixed, >>>>> ranges = >>>>> TRUE) >>>>> #13: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = >>>>> maxRmm, with.Rindels = FALSE) >>>>> #14: trimLRPatterns(subject = fq2, Rpattern = Rpat, max.Rmismatch = >>>>> maxRmm, with.Rindels = FALSE) >>>>> >>>>> The problem did not occur in R 3.0.3 with BioC 2.13. >>>>> Do you have an idea what's wrong? >>>>> >>>>> Thanks for your help, >>>>> Michael >>>>> >>>>> >>>>> >>>>> sessionInfo()R version 3.1.0 (2014-04-10) >>>>> #Platform: x86_64-unknown-linux-gnu (64-bit) >>>>> # >>>>> #locale: >>>>> #[1] C >>>>> # >>>>> #attached base packages: >>>>> #[1] parallel stats graphics grDevices utils datasets >>>>> methods >>>>> #[8] base >>>>> # >>>>> #other attached packages: >>>>> # [1] ShortRead_1.22.0 GenomicAlignments_1.0.0 BSgenome_1.32.0 >>>>> >>>>> # [4] Rsamtools_1.16.0 GenomicRanges_1.16.0 >>>>> GenomeInfoDb_1.0.1 >>>>> >>>>> # [7] BiocParallel_0.6.0 Biostrings_2.32.0 XVector_0.4.0 >>>>> >>>>> #[10] IRanges_1.22.2 BiocGenerics_0.10.0 >>>>> RColorBrewer_1.0-5 >>>>> >>>>> # >>>>> #loaded via a namespace (and not attached): >>>>> # [1] BBmisc_1.5 BatchJobs_1.2 Biobase_2.24.0 >>>>> # [4] DBI_0.2-7 RSQLite_0.11.4 Rcpp_0.11.1 >>>>> # [7] bitops_1.0-6 brew_1.0-6 codetools_0.2-8 >>>>> #[10] compiler_3.1.0 digest_0.6.4 fail_1.2 >>>>> #[13] foreach_1.4.2 grid_3.1.0 hwriter_1.3 >>>>> #[16] iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26 >>>>> #[19] plyr_1.8.1 sendmailR_1.1-2 stats4_3.1.0 >>>>> #[22] stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0 >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>> >> > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel