Apologies for a previous email that seems content free.

I've run into a cosmic mis-match between biomaRt and TxDb which is either a bug 
or a bug waiting to happen.  In brief, biomaRt reports entrezgene_id as a 
numeric, but TxDb wants it as a character.  What's deadly in this is that TxDb 
doesn't fail from being supplied with the numeric, it simply accesses the wrong 
gene.  Here is a minimal example where I am trying to get from gene name 
(Kcnj12) to gene location:

  ## Resolve the gene name:
  ensembl = useMart('ensembl', dataset='mmusculus_gene_ensembl')
  geneNames=getBM(c('entrezgene_id', 'external_gene_name'), mart= ensembl)
  idx = geneNames$external_gene_name == 'Kcnj12'
  entrezGeneId = geneNames$entrezgene_id[idx]

  ## Get gene locations:
  txdb = TxDb.Mmusculus.UCSC.mm10.knownGene
  tbg =  transcriptsBy(txdb,by='gene')

  ## Shoot self in foot:
  WRONG_LOCATION = tbg[[entrezGeneId]]

  ## Get email from biologist pointing out you've got the wrong gene:
  ACTUAL_LOCATION = tbg[[as.character(entrezGeneId)]]

I would argue that if entrezgene_id is used in some places as a numeric and 
others as a character, it's safer if biomaRt returns it as a character.  If 
your code is wrong, you want it to fail, not quietly mis-perform.  A vector or 
list will always let you access it using a numeric even when this is wrong.  
You will probably get an error if you try to access something with a character 
when you should be using a numeric.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to