Please ask follow-up questions about Bioconductor packages on the Bioconductor mailing list.
http://bioconductor.org/help/mailing-list/mailform/ If you are interested in organisms rather than chips, use the organism package, e.g., for Homo sapiens library(org.Hs.eg.db) df0 = select(org.Hs.eg.db, keys(org.Hs.eg.db), "GO") giving > head(df) ENTREZID GO EVIDENCE ONTOLOGY 1 1 GO:0003674 ND MF 2 1 GO:0005576 IDA CC 3 1 GO:0008150 ND BP 4 10 GO:0004060 IEA MF 5 10 GO:0005829 TAS CC 6 10 GO:0006805 TAS BP from which you might df = unique(df0[df0$ONTOLOGY == "BP", c("ENTREZID", "GO")]) len = tapply(df$ENTREZID, df$GO, length) keep = len[len < 1000] to get a vector of counts, with names being GO ids. Remember that the GO is a directed acyclic graph, so terms are nested; you'll likely want to give some thought to what you're actually wanting. The vignettes in the AnnotationDbi and Category packages http://bioconductor.org/packages/release/bioc/html/AnnotationDbi.html http://bioconductor.org/packages/release/bioc/html/Category.html are two useful sources of information, as is the annotation work flow http://bioconductor.org/help/workflows/annotation/ Martin ----- Chirag Gupta <cxg...@email.uark.edu> wrote: > Hi > I think I asked the wrong question. Apologies. > > Actually I want all the GO BP annotations for my organism and from them I > want to retain only those annotations which annotate less than a specified > number of genes. (say <1000 genes) > > I hope I have put it clearly. > > sorry again. > > Thanks! > > > On Sun, Jul 7, 2013 at 6:55 AM, Martin Morgan <mtmor...@fhcrc.org> wrote: > > > In Bioconductor, install the annotation package > > > > > > http://bioconductor.org/packages/release/BiocViews.html#___AnnotationData > > > > corresponding to your chip, e.g., > > > > source("http://bioconductor.org/biocLite.R") > > biocLite("hgu95av2.db") > > > > then load it and select the GO terms corresponding to your probes > > > > library(hgu95av2.db) > > lkup <- select(hgu95av2.db, rownames(dat), "GO") > > > > then use standard R commands to find the probesets that have the GO id > > you're interested in > > > > keep = lkup$GO %in% "GO:0006355" > > unique(lkup$PROBEID[keep]) > > > > Ask follow-up questions about Bioconductor packages on the Bioconductor > > mailing list > > > > http://bioconductor.org/help/mailing-list/mailform/ > > > > Martin > > ----- Rui Barradas <ruipbarra...@sapo.pt> wrote: > > > Hello, > > > > > > Your question is not very clear, maybe if you post a data example. > > > To do so, use ?dput. If your data frame is named 'dat', use the > > following. > > > > > > dput(head(dat, 50)) # paste the output of this in a post > > > > > > > > > If you want to get the rownames matching a certain pattern, maybe > > > something like the following. > > > > > > > > > idx <- grep("GO:0006355", rownames(dat)) > > > dat[idx, ] > > > > > > > > > Hope this helps, > > > > > > Rui Barradas > > > > > > > > > Em 07-07-2013 07:01, Chirag Gupta escreveu: > > > > Hello everyone > > > > > > > > I have a dataframe with rows as probeset ID and columns as samples > > > > I want to check the rownames and find which are those probes are > > > > transcription factors. (GO:0006355 ) > > > > > > > > Any suggestions? > > > > > > > > Thanks! > > > > > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > *Chirag Gupta* > Department of Crop, Soil, and Environmental Sciences, > 115 Plant Sciences Building, Fayetteville, Arkansas 72701 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.