Hi Michael,

On 05/09/2014 04:39 PM, Michael Lawrence wrote:
What would be the fastest way to do this with a DNAString?  Just an
alphabetFrequency?

That would do it.

A couple of other issues I ran into with the 2bit code:

(1) It fails on empty sequences:

    > export(DNAStringSet(c("AA", "", "CC")), "ww.2bit")
    Warning message:
    In (function (object, seqname)  :
      needLargeMem: trying to allocate 0 bytes (limit: 17179869184)
Error in sapply(object, function(x) typeof(x) == "externalptr" && is(x, :
      error in evaluating the argument 'X' in selecting a method for
      function 'sapply': Error in (function (object, seqname)  : UCSC
      library operation failed

(2) Could be that internal helper rtracklayer:::.DNAString_to_twoBit()
    is introducing a memory leak as it doesn't seem that the memory
    the returned external pointer is pointing to (a struct twoBit) is
    ever released. The memory leak is minor if the sequence passed via
    'object' has no masks but can be important if there are masks and
    if the masks are made of hundreds of thousands of ranges.

Thanks,
H.



On Fri, May 9, 2014 at 4:07 PM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:

    Hi Michael,

       library(rtracklayer)
       library(Biostrings)
       x <- DNAStringSet("AAA-CCC-GGG-TTT-__NNN-KKK")

    Then:

       > x
         A DNAStringSet instance of length 1
           width seq
       [1]    23 AAA-CCC-GGG-TTT-NNN-KKK

       > export(x, "x.2bit")

       > import("x.2bit")
         A DNAStringSet instance of length 1
           width seq                                               names
       [1]    23 AAATCCCTGGGTTTTTNNNTTTT                           1

    What about having the "export" method for TwoBitFile raise an error
    (or at least issue a warning) instead of silently turning everything
    that is not A, C, G, T, or N into a T?

    Thanks,
    H.

    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>

    _________________________________________________
    Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
    https://stat.ethz.ch/mailman/__listinfo/bioc-devel
    <https://stat.ethz.ch/mailman/listinfo/bioc-devel>



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to