On 03/19/14 10:24, Michael Love wrote:
hi Valerie,

If the Bam is not sorted by name, isn't it possible that readGAlignment*
will load > yieldSize number of reads in order to find the mate?

Sorry, our emails keep criss-crossing.

Because the mate-pairing is now done in C yieldSize is no longer a constraint.

When yieldSize is given, say 100, then 100 mates are returned. The algo moves through the file until 100 records are successfully paired. These 100 are yielded to the user and the code picks up where it left off. A related situation is the 'which' in a param. In this case you want the mates in a particular range. The algo moves through the range and pairs what it can. If it's left with unmated records it goes outside the range looking for them. readGAlignmentsList() will return all records in this range, mated or not. The metadata column of 'mate_status' indicates the different groupings.

Valerie



Mike


On Wed, Mar 19, 2014 at 1:04 PM, Valerie Obenchain <voben...@fhcrc.org
<mailto:voben...@fhcrc.org>> wrote:

    Hi Mike,

    You no longer need to sort Bam files to use the pairing algo or
    yieldSize. The readGAlignment* functions now work with both
    constraints out of the box.

    Create a BamFile with yieldSize and indicate you want mates.
    bf <- BamFile(fl, yieldSize=10000, asMates=TRUE)

    Maybe set some specifications in a param:
    param <- ScanBamParam(what = c("qname", "flag"))

    Then call either readGAlignment* method that handles pairs:
    readGAlignmentsList(bf, param=param)
    readGAlignmentPairs(bf, param=param)

    For summarizeOverlaps():
    summarizeOverlaps(annotation, bf, param=param, singleEnd=FALSE)

    We've considered removing the 'obeyQname' arg and documentation but
    thought the concept may be useful in another application. I'll
    revisit the summarizeOverlaps() documentation to make sure
    'obeyQname' is downplayed and 'asMates' is encouraged.

    Valerie




    On 03/19/14 07:39, Michael Love wrote:

        hi,

             From last year, in order to use yieldSize with paired-end
            BAMs, I

        should sort the BAMs by qname and then use the following call to
        BamFile:

        library(pasillaBamSubset)
        fl <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE)
        bf <- BamFile(fl, index=character(0), yieldSize=3, obeyQname=TRUE)

        https://stat.ethz.ch/__pipermail/bioconductor/2013-__March/051490.html
        <https://stat.ethz.ch/pipermail/bioconductor/2013-March/051490.html>

        If I want to use GenomicAlignments::__readGAlignmentsList with
        asMates=TRUE and respecting the yieldSize, what is the proper
        construction? (in the end, I want to use summarizeOverlaps on
        paired-end BAMs while respecting the yieldSize)

        library(pasillaBamSubset)
        fl <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE)
        bf <- BamFile(fl, index=character(0), yieldSize=3,
        obeyQname=TRUE, asMates=TRUE)
        x <- readGAlignmentsList(bf)
        Warning message:
           In scanBam(bamfile, ..., param = param) :
             'obeyQname=TRUE' ignored when 'asMates=TRUE'
           Calls: readGAlignmentsList ... .matesFromBam ->
        .load_bamcols_from_bamfile -> scanBam -> scanBam

        I see in the man pages for summarizeOverlaps it has:

        "In Bioconductor > 2.12 it is not
        necessary to sort paired-end BAM files by ‘qname’. When
        counting with ‘summarizeOverlaps’, setting ‘singleEnd=FALSE’
        will trigger paired-end reading and counting."

        but I don't see how this can respect the specified yieldSize,
        because
        readGAlignmentsList has to read in as many reads as necessary to
        find
        the mate.

        Sorry in advance if I am missing something in the documentation!

        Mike

        _________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        mailing list
        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>




_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to