hi Valerie, If the Bam is not sorted by name, isn't it possible that readGAlignment* will load > yieldSize number of reads in order to find the mate?
Mike On Wed, Mar 19, 2014 at 1:04 PM, Valerie Obenchain <voben...@fhcrc.org>wrote: > Hi Mike, > > You no longer need to sort Bam files to use the pairing algo or yieldSize. > The readGAlignment* functions now work with both constraints out of the box. > > Create a BamFile with yieldSize and indicate you want mates. > bf <- BamFile(fl, yieldSize=10000, asMates=TRUE) > > Maybe set some specifications in a param: > param <- ScanBamParam(what = c("qname", "flag")) > > Then call either readGAlignment* method that handles pairs: > readGAlignmentsList(bf, param=param) > readGAlignmentPairs(bf, param=param) > > For summarizeOverlaps(): > summarizeOverlaps(annotation, bf, param=param, singleEnd=FALSE) > > We've considered removing the 'obeyQname' arg and documentation but > thought the concept may be useful in another application. I'll revisit the > summarizeOverlaps() documentation to make sure 'obeyQname' is downplayed > and 'asMates' is encouraged. > > Valerie > > > > > On 03/19/14 07:39, Michael Love wrote: > >> hi, >> >> From last year, in order to use yieldSize with paired-end BAMs, I >>> >> should sort the BAMs by qname and then use the following call to >> BamFile: >> >> library(pasillaBamSubset) >> fl <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE) >> bf <- BamFile(fl, index=character(0), yieldSize=3, obeyQname=TRUE) >> >> https://stat.ethz.ch/pipermail/bioconductor/2013-March/051490.html >> >> If I want to use GenomicAlignments::readGAlignmentsList with >> asMates=TRUE and respecting the yieldSize, what is the proper >> construction? (in the end, I want to use summarizeOverlaps on >> paired-end BAMs while respecting the yieldSize) >> >> library(pasillaBamSubset) >> fl <- sortBam(untreated3_chr4(), tempfile(), byQname=TRUE) >> bf <- BamFile(fl, index=character(0), yieldSize=3, obeyQname=TRUE, >> asMates=TRUE) >> x <- readGAlignmentsList(bf) >> Warning message: >> In scanBam(bamfile, ..., param = param) : >> 'obeyQname=TRUE' ignored when 'asMates=TRUE' >> Calls: readGAlignmentsList ... .matesFromBam -> >> .load_bamcols_from_bamfile -> scanBam -> scanBam >> >> I see in the man pages for summarizeOverlaps it has: >> >> "In Bioconductor > 2.12 it is not >> necessary to sort paired-end BAM files by âqnameâ. When >> counting with âsummarizeOverlapsâ, setting âsingleEnd=FALSEâ >> will trigger paired-end reading and counting." >> >> but I don't see how this can respect the specified yieldSize, because >> readGAlignmentsList has to read in as many reads as necessary to find >> the mate. >> >> Sorry in advance if I am missing something in the documentation! >> >> Mike >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel