On 05/12/2014 12:23 PM, Michael Lawrence wrote:
On Mon, May 12, 2014 at 11:41 AM, Hervé Pagès <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> wrote: Hi Michael, On 05/09/2014 04:39 PM, Michael Lawrence wrote: What would be the fastest way to do this with a DNAString? Just an alphabetFrequency? That would do it. A couple of other issues I ran into with the 2bit code: (1) It fails on empty sequences: > export(DNAStringSet(c("AA", "", "CC")), "ww.2bit") Warning message: In (function (object, seqname) : needLargeMem: trying to allocate 0 bytes (limit: 17179869184 <tel:17179869184>) Error in sapply(object, function(x) typeof(x) == "externalptr" && is(x, : error in evaluating the argument 'X' in selecting a method for function 'sapply': Error in (function (object, seqname) : UCSC library operation failed Thanks for catching this one. (2) Could be that internal helper rtracklayer:::.DNAString_to___twoBit() is introducing a memory leak as it doesn't seem that the memory the returned external pointer is pointing to (a struct twoBit) is ever released. The memory leak is minor if the sequence passed via 'object' has no masks but can be important if there are masks and if the masks are made of hundreds of thousands of ranges. Right now it is the responsibility of the caller to free that memory. Probably should have used a finalizer on the externalptr, but the way it works now is that the write function frees the object. So it's not leaking (as far as I know), but the design could be improved.
I see. So we're probably OK as long as the loop containing the calls to .DNAString_to_twoBit() is successful and nothing goes wrong after that (e.g. no user interrupt). Thanks, H.
Thanks, H. On Fri, May 9, 2014 at 4:07 PM, Hervé Pagès <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote: Hi Michael, library(rtracklayer) library(Biostrings) x <- DNAStringSet("AAA-CCC-GGG-TTT-____NNN-KKK") Then: > x A DNAStringSet instance of length 1 width seq [1] 23 AAA-CCC-GGG-TTT-NNN-KKK > export(x, "x.2bit") > import("x.2bit") A DNAStringSet instance of length 1 width seq names [1] 23 AAATCCCTGGGTTTTTNNNTTTT 1 What about having the "export" method for TwoBitFile raise an error (or at least issue a warning) instead of silently turning everything that is not A, C, G, T, or N into a T? Thanks, H. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319> ___________________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> <mailto:Bioc-devel@r-project.__org <mailto:Bioc-devel@r-project.org>> mailing list https://stat.ethz.ch/mailman/____listinfo/bioc-devel <https://stat.ethz.ch/mailman/__listinfo/bioc-devel> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
-- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel