Does anyone else do mapping between NCBI taxids, names, and ranks? We do this in curatedMetagenomicData and soon other packages, currently using external files that lack provenance and versioning, so Ludwig Geistlinger was looking for Bioconductor annotation resources. The closest he found was in GenomeInfoDbData <https://bioconductor.org/packages/GenomeInfoDbData> but this has only genus and species, and some quirks like Bacteria being listed as a genus:
> library(GenomeInfoDbData) > data(specData) > head(specData) tax_id genus species 1 1 all <NA> 2 1 root <NA> 3 2 Bacteria <NA> 4 6 Azorhizobium <NA> 5 7 Azorhizobium caulinodans 6 9 Buchnera aphidicola > dim(specData) [1] 2521271 3 > subset(specData, c(genus == "Escherichia" & species == "coli"))$tax_id [1] 562 Any thoughts from the GenomeInfoDbData maintainer ("Bioconductor Maintainer <maintainer at bioconductor.org>") about a pull request either to a) update specData to add additional columns for all taxonomic levels, or b) creating a new object? Or, another approach altogether? See https://github.com/waldronlab/curatedMetagenomicData/issues/245. -- Levi Waldron Associate Professor Department of Epidemiology and Biostatistics CUNY Graduate School of Public Health and Health Policy Institute for Implementation Science in Population Health 55 W 125th St, New York NY 10035 https://waldronlab.io Join the microbiome Virtual International Forum: https://microbiome-vif.org [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel