On 05/12/2014 12:23 PM, Michael Lawrence wrote:



On Mon, May 12, 2014 at 11:41 AM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:

    Hi Michael,


    On 05/09/2014 04:39 PM, Michael Lawrence wrote:

        What would be the fastest way to do this with a DNAString?  Just an
        alphabetFrequency?


    That would do it.

    A couple of other issues I ran into with the 2bit code:

    (1) It fails on empty sequences:

         > export(DNAStringSet(c("AA", "", "CC")), "ww.2bit")
         Warning message:
         In (function (object, seqname)  :
           needLargeMem: trying to allocate 0 bytes (limit: 17179869184
    <tel:17179869184>)
         Error in sapply(object, function(x) typeof(x) == "externalptr"
    && is(x,  :
           error in evaluating the argument 'X' in selecting a method for
           function 'sapply': Error in (function (object, seqname)  : UCSC
           library operation failed


Thanks for catching this one.

    (2) Could be that internal helper rtracklayer:::.DNAString_to___twoBit()
         is introducing a memory leak as it doesn't seem that the memory
         the returned external pointer is pointing to (a struct twoBit) is
         ever released. The memory leak is minor if the sequence passed via
         'object' has no masks but can be important if there are masks and
         if the masks are made of hundreds of thousands of ranges.


Right now it is the responsibility of the caller to free that memory.
Probably should have used a finalizer on the externalptr, but the way it
works now is that the write function frees the object. So it's not
leaking (as far as I know), but the design could be improved.

I see. So we're probably OK as long as the loop containing the calls
to .DNAString_to_twoBit() is successful and nothing goes wrong after
that (e.g. no user interrupt).

Thanks,
H.


    Thanks,
    H.



        On Fri, May 9, 2014 at 4:07 PM, Hervé Pagès <hpa...@fhcrc.org
        <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote:

             Hi Michael,

                library(rtracklayer)
                library(Biostrings)
                x <- DNAStringSet("AAA-CCC-GGG-TTT-____NNN-KKK")


             Then:

                > x
                  A DNAStringSet instance of length 1
                    width seq
                [1]    23 AAA-CCC-GGG-TTT-NNN-KKK

                > export(x, "x.2bit")

                > import("x.2bit")
                  A DNAStringSet instance of length 1
                    width seq
             names
                [1]    23 AAATCCCTGGGTTTTTNNNTTTT
             1

             What about having the "export" method for TwoBitFile raise
        an error
             (or at least issue a warning) instead of silently turning
        everything
             that is not A, C, G, T, or N into a T?

             Thanks,
             H.

             --
             Hervé Pagès

             Program in Computational Biology
             Division of Public Health Sciences
             Fred Hutchinson Cancer Research Center
             1100 Fairview Ave. N, M1-B514
             P.O. Box 19024
             Seattle, WA 98109-1024

             E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
             Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
        <tel:%28206%29%20667-5791>
             Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
        <tel:%28206%29%20667-1319>

             ___________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        <mailto:Bioc-devel@r-project.__org
        <mailto:Bioc-devel@r-project.org>> mailing list
        https://stat.ethz.ch/mailman/____listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
             <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>



    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to