Apologies for a previous email that seems content free.
I've run into a cosmic mis-match between biomaRt and TxDb which is either a bug or a bug waiting to happen. In brief, biomaRt reports entrezgene_id as a numeric, but TxDb wants it as a character. What's deadly in this is that TxDb doesn't fail from being supplied with the numeric, it simply accesses the wrong gene. Here is a minimal example where I am trying to get from gene name (Kcnj12) to gene location: ## Resolve the gene name: ensembl = useMart('ensembl', dataset='mmusculus_gene_ensembl') geneNames=getBM(c('entrezgene_id', 'external_gene_name'), mart= ensembl) idx = geneNames$external_gene_name == 'Kcnj12' entrezGeneId = geneNames$entrezgene_id[idx] ## Get gene locations: txdb = TxDb.Mmusculus.UCSC.mm10.knownGene tbg = transcriptsBy(txdb,by='gene') ## Shoot self in foot: WRONG_LOCATION = tbg[[entrezGeneId]] ## Get email from biologist pointing out you've got the wrong gene: ACTUAL_LOCATION = tbg[[as.character(entrezGeneId)]] I would argue that if entrezgene_id is used in some places as a numeric and others as a character, it's safer if biomaRt returns it as a character. If your code is wrong, you want it to fail, not quietly mis-perform. A vector or list will always let you access it using a numeric even when this is wrong. You will probably get an error if you try to access something with a character when you should be using a numeric. _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel