Re: [Bioc-devel] BamFile validation

Martin Morgan Tue, 08 Jan 2013 10:54:05 -0800

On 01/07/2013 12:32 PM, Nicolas Delhomme wrote:

Hi Martin, Marc,


I'm now implementing the use of BamFile objects in easyRNASeq and I like
them. I think it would be very useful if when constructing a BamFile the
existence of the path and index could be tested; i.e. this works:
BamFile("test.bam","test.bam.bai") although these files do not exist. Is
there a reason that this validation is not done? If there is, could a
validation parameter be added (set to FALSE by default to keep the current
behavior) that would check for the files' existence? The same goes for the
yieldSize argument, i.e. this works
BamFile("test.bam","test.bam.bai",yieldSize=-1), although I'm not sure what a
-1 yieldSize means. I can of course do these validations within easyRNASeq,
but anyone else building packages on top of BamFile would probably want to do
the same...

I want to be able to specify a BAM file without opening it, and then open itlater, e.g., in mclapply or after distributing to a cluster. Also, conceptually,I want to distinguish between processing an entire BAM file -- provide me withsomething for which isOpen(BamFile("foo")) == FALSE -- versus reading a chunk ofa BamFile, i.e., already open. So I separated BamFile creation from open().

I focus on open() in the above because opening the BAM file is a cheap way tovalidate that the BAM file exists -- it could be local or remote (http or ftp,so file.exists isn't sufficient) and even if the file 'exists' as Ryan mentionsit needs to actually be a BAM file so should, e.g., have a header. open() allowsfor all of these possibilities. Also, the consequences of trying to open anon-existent file results in a clear enough error


> open(BamFile("sdfs"))
Error in value[[3L]](cond) :
  failed to open BamFile: file(s) do not exist:
  'sdfs'

So against the votes of the other contributors to this thread, I haven't made achange. Sorry about that.


I added a check that yieldSize is a non-negative scalar integer, or NA.


A related point unclear at the moment in the documentation is what the index
filename should be: i.e. scanBam expects as the index the same value as for
the bam filename (that assumes the user has not renamed his bam.bai file  and
you never know what users might be doing... :-S ... ) but the BamFile Rd page
says:

file: A character vector of BAM file paths

> index:  A character vector of indices (forBamFile);


so it's unclear to me what the index character vector should contain.

Tried to clarify that, it's just a character vector containing the path to theindex file. Generally, the code tries not to care about whether the index fileis specified with a '.bai' extension, or without.


Martin


Thanks again for this set of class, they're really handy!

Here's my sessionInfo:

R Under development (unstable) (2012-10-02 r60861) Platform:
x86_64-apple-darwin10.8.0 (64-bit)

locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages: [1] parallel  stats     graphics  grDevices utils
datasets  methods [8] base

other attached packages: [1] Rsamtools_1.11.14     Biostrings_2.27.8
GenomicRanges_1.11.21 [4] IRanges_1.17.24       BiocGenerics_0.5.6
BiocInstaller_1.9.6

loaded via a namespace (and not attached): [1] bitops_1.0-5   stats4_2.16.0
tools_2.16.0   zlibbioc_1.5.0

Cheers,

Nico

--------------------------------------------------------------- Nicolas
Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310 Email: [email protected] Meyerhofstrasse 1 -
Postfach 10.2209 69102 Heidelberg, Germany

_______________________________________________ [email protected]
mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] BamFile validation

Reply via email to