Hi Martin, On 8 Jan 2013, at 19:53, Martin Morgan wrote:
> On 01/07/2013 12:32 PM, Nicolas Delhomme wrote: >> Hi Martin, Marc, >> >> I'm now implementing the use of BamFile objects in easyRNASeq and I like >> them. I think it would be very useful if when constructing a BamFile the >> existence of the path and index could be tested; i.e. this works: >> BamFile("test.bam","test.bam.bai") although these files do not exist. Is >> there a reason that this validation is not done? If there is, could a >> validation parameter be added (set to FALSE by default to keep the current >> behavior) that would check for the files' existence? The same goes for the >> yieldSize argument, i.e. this works >> BamFile("test.bam","test.bam.bai",yieldSize=-1), although I'm not sure what a >> -1 yieldSize means. I can of course do these validations within easyRNASeq, >> but anyone else building packages on top of BamFile would probably want to do >> the same... > > I want to be able to specify a BAM file without opening it, and then open it > later, e.g., in mclapply or after distributing to a cluster. Also, > conceptually, I want to distinguish between processing an entire BAM file -- > provide me with something for which isOpen(BamFile("foo")) == FALSE -- versus > reading a chunk of a BamFile, i.e., already open. So I separated BamFile > creation from open(). > > I focus on open() in the above because opening the BAM file is a cheap way to > validate that the BAM file exists -- it could be local or remote (http or > ftp, so file.exists isn't sufficient) and even if the file 'exists' as Ryan > mentions it needs to actually be a BAM file so should, e.g., have a header. > open() allows for all of these possibilities. Also, the consequences of > trying to open a non-existent file results in a clear enough error > > > open(BamFile("sdfs")) > Error in value[[3L]](cond) : > failed to open BamFile: file(s) do not exist: > 'sdfs' > > So against the votes of the other contributors to this thread, I haven't made > a change. Sorry about that. No need to. I hadn't thought of a use case as those you presented above where not checking makes perfect sense. I'll use open for validating. > > I added a check that yieldSize is a non-negative scalar integer, or NA. Great thanks. > >> >> A related point unclear at the moment in the documentation is what the index >> filename should be: i.e. scanBam expects as the index the same value as for >> the bam filename (that assumes the user has not renamed his bam.bai file and >> you never know what users might be doing... :-S ... ) but the BamFile Rd page >> says: >> >> file: A character vector of BAM file paths > > index: A character vector of indices (forBamFile); >> >> so it's unclear to me what the index character vector should contain. > > Tried to clarify that, it's just a character vector containing the path to > the index file. Generally, the code tries not to care about whether the index > file is specified with a '.bai' extension, or without. That was my perception :-) just wanted to be sure. A related question, could you detail which functions require the bai index to be present and which ones "just" benefit from it? Cheers, Nico > > Martin > >> >> Thanks again for this set of class, they're really handy! >> >> Here's my sessionInfo: >> >> R Under development (unstable) (2012-10-02 r60861) Platform: >> x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 >> >> attached base packages: [1] parallel stats graphics grDevices utils >> datasets methods [8] base >> >> other attached packages: [1] Rsamtools_1.11.14 Biostrings_2.27.8 >> GenomicRanges_1.11.21 [4] IRanges_1.17.24 BiocGenerics_0.5.6 >> BiocInstaller_1.9.6 >> >> loaded via a namespace (and not attached): [1] bitops_1.0-5 stats4_2.16.0 >> tools_2.16.0 zlibbioc_1.5.0 >> >> Cheers, >> >> Nico >> >> --------------------------------------------------------------- Nicolas >> Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 Email: nicolas.delho...@embl.de Meyerhofstrasse 1 - >> Postfach 10.2209 69102 Heidelberg, Germany >> >> _______________________________________________ Bioc-devel@r-project.org >> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel