The bioc-devel list is intended for questions pertaining to package development, not questions/remarks about existing packages. For that sort of thing, please use the support site, https://support.bioconductor.org.
To your point, a bug is something that happens that wasn't intended by the developer. The developers of the TxDb infrastructure (and pretty much all of the annotation packages) intend for all identifiers to be character. On the other hand, biomaRt, which is a contributed package and which queries and returns data from an online database intends for the Gene IDs to be numeric, as that is what is returned by that database. It's not a bug for one package to do one thing and another to do another thing! Different people do different things when they develop packages, and to assume that all of the ~1700 packages in Bioconductor are somehow set up such that whatever results one package returns will be seamlessly useful as input to another is not possible, and you shouldn't assume that it is. On Thu, Oct 3, 2019 at 6:46 AM Michael Shapiro <si...@earthlink.net> wrote: > > Apologies for a previous email that seems content free. > > I've run into a cosmic mis-match between biomaRt and TxDb which is either > a bug or a bug waiting to happen. In brief, biomaRt reports entrezgene_id > as a numeric, but TxDb wants it as a character. What's deadly in this is > that TxDb doesn't fail from being supplied with the numeric, it simply > accesses the wrong gene. Here is a minimal example where I am trying to > get from gene name (Kcnj12) to gene location: > > ## Resolve the gene name: > ensembl = useMart('ensembl', dataset='mmusculus_gene_ensembl') > geneNames=getBM(c('entrezgene_id', 'external_gene_name'), mart= ensembl) > idx = geneNames$external_gene_name == 'Kcnj12' > entrezGeneId = geneNames$entrezgene_id[idx] > > ## Get gene locations: > txdb = TxDb.Mmusculus.UCSC.mm10.knownGene > tbg = transcriptsBy(txdb,by='gene') > > ## Shoot self in foot: > WRONG_LOCATION = tbg[[entrezGeneId]] > > ## Get email from biologist pointing out you've got the wrong gene: > ACTUAL_LOCATION = tbg[[as.character(entrezGeneId)]] > > I would argue that if entrezgene_id is used in some places as a numeric > and others as a character, it's safer if biomaRt returns it as a > character. If your code is wrong, you want it to fail, not quietly > mis-perform. A vector or list will always let you access it using a > numeric even when this is wrong. You will probably get an error if you try > to access something with a character when you should be using a numeric. > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel