Maybe Rsamtools would want to follow this precedent. I think there might be a difference between fishing out alignments from a SAM/BAM, and deriving a summary (tallyVariants) from a BAM. It seems like an argument could be made for a tally set to not contain duplicates.
On Mon, Feb 23, 2015 at 11:05 AM, Leonard Goldstein < goldstein.leon...@gene.com> wrote: > Hi Michael and Thomas, > > I ran into the same problem in the past (i.e. when I started working > with functions like scanBam I expected them not to return the same > alignment multiple times) > > One thing to consider might be that returning alignments multiple > times is consistent with the behavior of the samtools view command. > Quoting from the samtools manual: > > “Important note: when multiple regions are given, some alignments may > be output multiple times if they overlap more than one of the > specified regions.” > > Maybe there is an argument for keeping things consistent with > samtools? As you said, if documented properly, the user can decide > whether to reduce regions specified in which or not. > > Leonard > > > On Mon, Feb 23, 2015 at 10:52 AM, Michael Lawrence > <lawrence.mich...@gene.com> wrote: > > We should at leaast try to avoid surprising the user. Seems like most > > people expect "which" to be a simple restriction, so I think for now I > will > > just reduce the which, and if someone has a use case for separate > queries, > > we can address it in the future. > > > > On Mon, Feb 23, 2015 at 10:41 AM, Thomas Sandmann < > sandmann.tho...@gene.com> > > wrote: > > > >> Personally, I don't have a use case with "meaningful loci" worth > tracking, > >> so keeping it simple would work for me. > >> > >> Incidentally, would it be good to deal with the 'which' parameter in a > >> consistent way across different methods ? I just saw this recent post on > >> the mailing list in which a used got confused by duplicate counts > returned > >> after passing 'which' to scanBamParam: > >> > >> https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html > >> > >> > >> --- > >> > >> Thomas Sandmann, PhD > >> Computational biologist > >> > >> Genentech, Inc. > >> 1 DNA Way > >> South San Francisco, CA 94080 > >> USA > >> > >> Phone: +1 650 225 6273 > >> Fax: +1 650 225 5389 > >> Email: sandmann.tho...@gene.com > >> > >> "If a man will begin with certainties, he shall end in doubts; but if he > >> will be content to begin with doubts he shall end in certainties." -- > Sir > >> Francis Bacon > >> > >> > >> On Mon, Feb 23, 2015 at 10:37 AM, Michael Lawrence < > >> lawrence.mich...@gene.com> wrote: > >> > >>> We just have to decide which is the more useful interpretation of which > >>> -- as a simple restriction, or as a vector of meaningful locii, which > will > >>> be analyzed individually? I would actually favor the first one (the > same as > >>> yours), just because it's simpler. To keep track of the query ranges, > we > >>> would need to add a new column to the returned object, which will more > >>> often than not just be clutter. I guess we could introduce a new > parameter, > >>> "reduceWhich" which defaults to TRUE and reduces the which. If FALSE, > it > >>> instead adds the column mapping back to the original which ranges. > >>> > >>> > >>> On Sun, Feb 22, 2015 at 2:36 PM, Thomas Sandmann < > >>> sandmann.tho...@gene.com> wrote: > >>> > >>>> Hi Michael, > >>>> > >>>> ah, I see. I hadn't realized that returning the pileups separately for > >>>> each region could be a desired feature, but that makes sense. I > agree, as > >>>> it is easy for the user to 'reduce' the ranges beforehand your first > option > >>>> (e.g. returning the ID of the range) would be more flexible. > >>>> > >>>> Perhaps you would consider adding a sentence to the documentation of > >>>> 'which' on BamTallyParam's help page explaining that users might want > to > >>>> 'reduce' their ranges beforehand if they are only interested in a > single > >>>> tally for each base ? > >>>> > >>>> Thanks a lot ! > >>>> Thomas > >>>> > >>>> > >>> > >> > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel