Apologies for a previous email that seems content free.
I've run into a cosmic mis-match between biomaRt and TxDb which is either a bug
or a bug waiting to happen. In brief, biomaRt reports entrezgene_id as a
numeric, but TxDb wants it as a character. What's deadly in this is that TxDb
doesn't fail from being supplied with the numeric, it simply accesses the wrong
gene. Here is a minimal example where I am trying to get from gene name
(Kcnj12) to gene location:
## Resolve the gene name:
ensembl = useMart('ensembl', dataset='mmusculus_gene_ensembl')
geneNames=getBM(c('entrezgene_id', 'external_gene_name'), mart= ensembl)
idx = geneNames$external_gene_name == 'Kcnj12'
entrezGeneId = geneNames$entrezgene_id[idx]
## Get gene locations:
txdb = TxDb.Mmusculus.UCSC.mm10.knownGene
tbg = transcriptsBy(txdb,by='gene')
## Shoot self in foot:
WRONG_LOCATION = tbg[[entrezGeneId]]
## Get email from biologist pointing out you've got the wrong gene:
ACTUAL_LOCATION = tbg[[as.character(entrezGeneId)]]
I would argue that if entrezgene_id is used in some places as a numeric and
others as a character, it's safer if biomaRt returns it as a character. If
your code is wrong, you want it to fail, not quietly mis-perform. A vector or
list will always let you access it using a numeric even when this is wrong.
You will probably get an error if you try to access something with a character
when you should be using a numeric.
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel