Question regarding gene name conversions. Once upon a time, I was doing a lot of gene name conversions, particularly from NM_#### to HGNC symbol or Entrez GeneID. I used bioMaRt successfully, and developed a cache matrix so I could quickly merge() it instead of calling out to a webservice repeatedly. Later the complexity of keeping the cache updated became overwhelming, and carrying around a few megabytes of possibly outdated identifiers is a bad idea. Per Bioconductor guidelines, I switched to the built in annotation packages. Now I'm using org.Hs.eg.db's lookup lists org.Hs.egREFSEQ2EG and org.Hs.egSYMBOL.
These sometimes map to multiple values and sometimes map to nothing, causing errors in my code. To clean it up, I wrapped their accessors with some error checking. Things work again, assigning one human readable name per transcript ID#. Problem is this method is very slow. I thought it could be the error checking code, but even trying to streamline that doesn't help. A profiler showed that most of my time was spent in .Call, actually it turns out each access to the "list" like this org.Hs.egSYMBOL[[eg]][1] was calling a sqlite query. Since I am nesting these calls in a loop, (NM to EG to HGNC, a few thousands of times), these copious calls out to sqlite are killing me. I need a way to batch query, or preload to memory these lookup tables. I tried using a hash, but checking if a value is already loaded into the hash-cache is equally time consuming; and preloading the whole of org.Hs.eg.db takes a few hours. I could do it once, and cache the .RData object, but we're back to the local-outdated cache problem. So I think the only solution would be to access the sqlite underlying the org.Hs.eg.db myself, so I can use the batch query. Except that db is hidden under the R/API of these Anno-BiMap objects like org.Hs.egSYMBOL. I assume this problem has been handled before, and ask for your guidance. Thanks [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel