Marc, This sounds like a great resource, and could help make Bioconductor more useful! As for what species to include, I would suggest to check the full list of KEGG species: http://www.genome.jp/kegg/catalog/org_list.html these are all complete genomes, hence should be generally more relevant species compare to those without complete genomes. Hopefully, many of them are well annotated. At least, the pathway annotations are easily available. Just my 2 cents, Weijun
-------------------------------------------- On Tue, 5/6/14, Marc Carlson <mcarl...@fhcrc.org> wrote: Subject: [Bioc-devel] Question about which new organism resources to create To: "bioc-devel@r-project.org" <bioc-devel@r-project.org> Date: Tuesday, May 6, 2014, 1:14 PM Hi everyone, As many of you already know we have long provided organism annnotation packages that give gene based annotations for selected organisms. And we intend to keep doing that. But these days there is also a lot of other data at NCBI that could be used to make gene based databases for other organisms. And at the same time, there is also greater and greater demand for annotations from other organisms too. So I aim to make organism based gene databases for a wider range of organisms. However instead of just making more packages, I intend to put these DBs into the AnnotationHub. You can get an idea about what access will be like by looking at the inparanoid8 objects that were put in for the last release. library(AnnotationHub) ah = AnnotationHub() hs8 = ah$inparanoid8.Orthologs.hom.Homo_sapiens.inp8.sqlite hs8 columns(hs8) k = head(keys(hs8, 'TOXOPLASMA_GONDII')) select(hs8, k, 'HOMO_SAPIENS', 'TOXOPLASMA_GONDII') ## etc. Anyhow my reason for posting is that I am now looking at all the NCBI data that could be used for annotation packages and trying to decide what to include. About half of the 14 thousand potential critters in the NCBI dataset only have about one gene annotated. I am guessing that it is not worth anyone's time to pre-process those organisms that have only one gene. Or is it? If you think it might be, now would probably be a good time to speak up. How many annotations do you guys want/expect in an organism package before it becomes annoying that you even downloaded it? Thanks in advance for your opinions, Marc _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel