On Mon, Oct 13, 2014 at 9:44 PM, Hervé Pagès <hpa...@fhcrc.org> wrote:
> Hi, > > On 10/11/2014 02:25 PM, Vincent Carey wrote: > >> On Sat, Oct 11, 2014 at 5:17 PM, Michael Lawrence < >> lawrence.mich...@gene.com >> >>> wrote: >>> >> >> But what it would do exactly? >>> >>> Probably would want to be able to extract a gene list from a TxDb, then >>> extract the desired type of structure from the TxDb. >>> >>> Not too bad right now, but it would be nice to leverage the identifier >>> type information on the gene list object. >>> >>> Currently: >>> tx <- transcripts(txdb, vals=list(gene_id=genes)) >>> >>> Proposed: >>> tx <- transcripts(txdb[GeneList]) >>> >>> >> yes, that makes sense. i don't go to txdb's as naturally as i should. >> > > Also coming a little late to the party, but I also have a preference > for Kasper's proposal of using subsetByXXX. > > Supporting 'txdb[GeneList]' is arbitrarily making gene ids special, > when a TxDb contains other ids (transcript and exon ids). > > My proposal was in the context of having formal vectors of IDs, as Gabe has done (internally as of yet). Basically, extending a character vector to track the type of ID. GSEABase has something similar. I agree plain old character vectors make no sense here. > Also if you push a little bit this concept, you quickly run into > some semantic headaches: > > - First, let's keep in mind that for a common track like the > "UCSC Genes" track, a lot of transcripts are not linked to any > gene. > > - Then, allowing subsetting a TxDb by a character vector means > a TxDb has names. At least conceptually. So it's tempting to > also support 'names(txdb)' (would return all the gene ids). > > - Finally, the names being unique, it seems natural to expect that > 'txdb[names(txdb)]' is a no-op. But it won't because > 'txdb[names(txdb)]' will drop all the transcripts that are not > linked to a gene. > > But before any TxDb subsetting can happen (via [ or subsetByXXX), we > need to bring back the classic (and healthier) pass-by-value semantic > on these objects. (Right now TxDb is a reference class and thus TxDb > objects have a pass-by-reference semantic.) > > H. > > > >> >> >>> >>> >>> On Sat, Oct 11, 2014 at 10:49 AM, Martin Morgan <mtmor...@fhcrc.org> >>> wrote: >>> >>> On 10/11/2014 08:41 AM, Vincent Carey wrote: >>>> >>>> Is there anything on the order of as([GeneSet], "GRanges") around? >>>>> >>>>> >>>> no, I don't think so; obviously of use and following a common theme. >>>> Martin >>>> >>>> >>>> >>>> On Sat, Sep 20, 2014 at 11:34 PM, Gabe Becker <becker.g...@gene.com> >>>>> wrote: >>>>> >>>>> Sean and Vincent, >>>>> >>>>>> >>>>>> The goal of what we are doing builds off of what Martin has in >>>>>> GSEABase. >>>>>> We were looking to see how much benefit we can get with something >>>>>> lighter-weight that lies between indistinguishable character vectors >>>>>> and >>>>>> the full machinery of GeneSets. >>>>>> >>>>>> Either way, it seems like formalizing the semantic information is a >>>>>> way >>>>>> to >>>>>> do what you want. Furthermore, these classed id objects can be created >>>>>> automatically when there is contextual information e.g. during queries >>>>>> to >>>>>> databases (or db-like objects), and then simply added to metadata >>>>>> DataFrames and re-used. >>>>>> >>>>>> ~G >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Sep 20, 2014 at 12:19 PM, Sean Davis <sdav...@mail.nih.gov> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Sat, Sep 20, 2014 at 3:11 PM, Gabe Becker <becker.g...@gene.com> >>>>>>> wrote: >>>>>>> >>>>>>> Hey all, >>>>>>> >>>>>>>> >>>>>>>> We are in the (very) early stages of experimenting with something >>>>>>>> that >>>>>>>> seems relevant here: classed identifiers. We are using them for >>>>>>>> database/mart queries, but the same concept could be useful for the >>>>>>>> cases >>>>>>>> you're describing I think. >>>>>>>> >>>>>>>> E.g. >>>>>>>> >>>>>>>> mysyms = GeneSymbol(c("BRAF", "BRCA1")) >>>>>>>> >>>>>>>>> mysyms >>>>>>>>> >>>>>>>>> An object of class "GeneSymbol" >>>>>>>> [1] "BRAF" "BRCA1" >>>>>>>> >>>>>>>> yourSE[mysyms, ] >>>>>>>>> >>>>>>>>> ... >>>>>>>> >>>>>>>> >>>>>>>> This approach has the flavor of some of the functionality that >>>>>>>> >>>>>>> Martin put >>>>>>> together for the GSEABase package (EntrezIdentifier, etc.). >>>>>>> >>>>>>> Sean >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> This approach has the benefit of being declarative instead of >>>>>>>> heuristic >>>>>>>> (people won't be able to accidentally invoke it), while still giving >>>>>>>> most >>>>>>>> of the convenience I believe you are looking for. >>>>>>>> >>>>>>>> The object classes inherit directly from character, so should "just >>>>>>>> work" >>>>>>>> most of the time, but as I said it's early days; lots more testing >>>>>>>> for >>>>>>>> functionality and usefulness is needed. >>>>>>>> >>>>>>>> ~G >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey < >>>>>>>> st...@channing.harvard.edu> >>>>>>>> wrote: >>>>>>>> >>>>>>>> OK by me to leave [ alone. We could start with subsetByEntrez, >>>>>>>> >>>>>>>>> subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID. >>>>>>>>> >>>>>>>>> Utilities to generate GRanges for queries in each of these >>>>>>>>> vocabularies >>>>>>>>> should, perhaps, be in the OrganismDb space? Once those are in >>>>>>>>> place >>>>>>>>> no additional infrastructure is necessary? >>>>>>>>> >>>>>>>>> On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. < >>>>>>>>> >>>>>>>>> tim.tri...@gmail.com> >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Agreed with Sean, having tried implementing to "magical" >>>>>>>>> alternative >>>>>>>>> >>>>>>>>>> >>>>>>>>>> --t >>>>>>>>>> >>>>>>>>>> On Sep 20, 2014, at 9:31 AM, Sean Davis <sdav...@mail.nih.gov> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>>> Hi, Vince. >>>>>>>>>>> >>>>>>>>>>> I'm coming a little late to the party, but I agree with Kasper's >>>>>>>>>>> >>>>>>>>>>> sentiment >>>>>>>>>> >>>>>>>>>> that the less "magical" approach of using subsetByXXX might be >>>>>>>>>>> the >>>>>>>>>>> >>>>>>>>>>> cleaner >>>>>>>>>> >>>>>>>>>> way to go for the time being. >>>>>>>>>>> >>>>>>>>>>> Sean >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey < >>>>>>>>>>> >>>>>>>>>>> st...@channing.harvard.edu> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> https://github.com/vjcitn/biocMultiAssay/blob/master/ >>>>>>>>> >>>>>>>> vignettes/SEresolver.Rnw >>>>>>>> >>>>>>>> >>>>>>>>> shows some modifications to [ that allow subsetting of SE by >>>>>>>>>>>> gene or pathway name >>>>>>>>>>>> >>>>>>>>>>>> it may be premature to work at the [ level. Kasper suggested >>>>>>>>>>>> >>>>>>>>>>>> defining >>>>>>>>>>> >>>>>>>>>> >>>>>>>> a suite of subsetBy operations that would accomplish this >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> i think we could get something along these lines into the >>>>>>>>>>>> release >>>>>>>>>>>> >>>>>>>>>>>> without >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> too much more work. votes? >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Computational Biologist >>>>>>>> Genentech Research >>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> Computational Biologist >>>>>> Genentech Research >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>>> >>>>> >>>> -- >>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. >>>> PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 >>>> Phone: (206) 667-2793 >>>> >>>> >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel