Michael,
For easier testing, with my mac OSX I can dial down the limit on number of files
using shell command `ulimit –n 30` but YMMV depending on OS support.
In any case, your suspicions were on target. R function bgzip seems to be the
culprit, and I am changing subject and cc:ing in Martin and Herve accordingly.
Martin xor Herve,
The problem can be reproduced by just calling bgzip repeatedly.
depending on your value for `ulimit –n`
library(Rsamtools)
bed<- system.file("doc", "example.bed", package="rtracklayer")
replicate(2000,bgzip(bed, 'delme.now',TRUE))
My workaround for now is to perform system calls to do the zipping and tabix
indexing. So, no urgency, but,
sessionInfo() is as below.
Thanks,
~Malcolm
*From:*Michael Lawrence [mailto:lawrence.mich...@gene.com]
<mailto:[mailto:lawrence.mich...@gene.com]>
*Sent:* Friday, November 09, 2012 5:52 AM
*To:* Cook, Malcolm
*Cc:* bioc-devel@r-project.org <mailto:bioc-devel@r-project.org>; Michael
Lawrence <lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com>>
(lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com>); Vincent Carey
(st...@channing.harvard.edu <mailto:st...@channing.harvard.edu>)
*Subject:* Re: rtracklayer BUG: `export(x, path, index=TRUE)` appears not to
close filehandle on tabix files produced
Hi Malcolm,
I am not sure why this is happening. I haven't been able to reproduce it on my
system (which I think has a limit of 1024, so I had to increase your test case
to exceed that). Does this happen when calling bgzip + indexTabix on a file 256
times? That would help to eliminate the complicated wrappers.
Thanks,
Michael
On Thu, Nov 8, 2012 at 2:32 PM, Cook, Malcolm <m...@stowers.org
<mailto:m...@stowers.org>> wrote:
rtracklayer developers (Michael/Vincent/Robert),
I find that tabix indexed exporting too many bed files causes an error.
The session following my signature reproduces the error.
It provides sessionInfo() details prior to the code causing the error because
sessionInfo() FAILS with 'too many open files' after running this code (as does
anything the opens files).
The error does NOT occur when index=FALSE. Only when index=TRUE.
I expect that the tabix calls are not cleaning up open file handles correctly.
uname -a tells me on my mac OSX that I can have 256 files open.
The bug happens during the 253rd bedfile.
openConnections() returns nothing.
closeAllConnections() does not clean them up.
lsof to list open files at the command line does NOT show them.
Michael(?), you resolved a similar issue I once reported with rtracklayer when
creating bigBed files :
https://lists.soe.ucsc.edu/pipermail/genome/2012-February/028343.html
Any suggestions for workarounds? Any possibility of a quick patch to released
rtracklayer?
Thanks for rtracklayer!
~Malcolm Cook
-----------------------------------------------------------
bash-3.2$ R
R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(rtracklayer)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Attaching package: 'BiocGenerics'
The following object(s) are masked from 'package:stats':
xtabs
The following object(s) are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, colnames,
duplicated, eval, get, intersect, lapply, mapply, mget, order, paste, pmax,
pmax.int <http://pmax.int>, pmin, pmin.int <http://pmin.int>, rbind, rep.int
<http://rep.int>, rownames, sapply, setdiff, table, tapply, union, unique
Loading required package: IRanges
Warning message:
package 'GenomicRanges' was built under R version 2.15.2
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rtracklayer_1.18.0 GenomicRanges_1.10.4 IRanges_1.16.4
BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] BSgenome_1.26.1 Biostrings_2.26.2 RCurl_1.95-3 Rsamtools_1.10.1
XML_3.95-0.1 bitops_1.0-4.2 parallel_2.15.1 stats4_2.15.1
tools_2.15.1 zlibbioc_1.4.0
> x<-sapply(sprintf('deleteme_%s.bed',1:1000), function(conn)
{export(GRanges('X',IRanges(1,2)),conn,index=TRUE);1})
Error in value[[3L]](cond) : index build failed
file: /Volumes/SAN1/Users/mec/deleteme/253.bed.gz
In addition: Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
[ti_index_build2] fail to create the index file.
> sessionInfo()
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file
'/Library/Frameworks/R.framework/Versions/2.15/Resources/library/rtracklayer/Meta/package.rds',
probable reason 'Too many open files'