Ah. That's the problem. The script in getdb.sh has R --slave < /home/ubuntu/cpb_anno/AnnotationBuildPipeline/annosrc/uniprot/script/ uniprot.ws/inst/script/processDataForBuild.R
which is a modification of what is in svn (to match the directory structure of the AMI), which calls on a script in a local version of the UniProt.ws package. The local version doesn't have any code for yeast, but the 'real' version (UniProt.ws) does. I assumed the local version was special, and that I should be using that because you were specifically using that one rather than an actually installed package. annosrc$ grep -i yeast uniprot/script/ uniprot.ws/inst/script/processDataForBuild.R annosrc$ annosrc$ grep -i yeast ~/R/x86_64-pc-linux-gnu-library/3.2/UniProt.ws/script/processDataForBuild.R ## Now for special treatment for missing stuff from yeast. getYeastData <- function(dbFile, db){ doYeastInserts <- function(db, table, data){ ## just one more run through to just do what is needed to get pfam into yeast. species <- 'chipsrc_yeast.sqlite' res <- getYeastData(species, db) doYeastInserts(db, "pfam", res[["pfam"]]) doYeastInserts(db, "smart", res[["smart"]]) Thanks! Jim On Mon, Oct 5, 2015 at 10:16 AM, Marc Carlson <mrj...@gmail.com> wrote: > You need to scroll down that script a ways... Look for 'yeast'. > > On Mon, Oct 5, 2015 at 6:11 AM, James W. MacDonald <jmac...@uw.edu> wrote: > >> Hi Marc, >> >> That script has this in it: >> >> ## For now just get data for the ones that we have traditionally supported >> ## I don't even know if the other species are available... >> speciesList = c("chipsrc_human.sqlite", >> "chipsrc_rat.sqlite", >> "chipsrc_chicken.sqlite", >> "chipsrc_zebrafish.sqlite", >> # "chipsrc_worm.sqlite", >> # "chipsrc_fly.sqlite", >> "chipsrc_mouse.sqlite", >> "chipsrc_bovine.sqlite" >> # "chipsrc_arabidopsis.sqlite" ## this is available and could be >> "activated" >> ## But to activate arabidopsis, remember you have to pre-add the >> tables... >> # "chipsrc_canine.sqlite", >> # "chipsrc_rhesus.sqlite", >> # "chipsrc_chimp.sqlite", >> # "chipsrc_anopheles.sqlite" >> ) >> >> And there is no mention of yeast anywhere. If I search all the scripts >> for say 'INSERT INTO pfam', I get >> >> custom_anno/script/bindb.sql >> 328:INSERT INTO pfam >> >> pfam/script/srcdb_pfam.sql >> 202:-- INSERT INTO pfamb >> >> organism_annotation/script/bindb_yeast.sql >> 441:-- INSERT INTO pfam >> >> yeast/script/bindb.sql >> 241:-- INSERT INTO pfam >> >> The first one is just doing all the metadata tables, and the other three >> are in code blocks that are commented out. Is it possible that you used a >> script that didn't make it into svn? >> >> Jim >> >> >> >> On Sun, Oct 4, 2015 at 2:36 PM, Marc Carlson <mrj...@gmail.com> wrote: >> >>> Hi Jim, >>> >>> You asked me on Friday where the PFAM Ids for yeast came from and I >>> couldn't recall because at the moment I was at Seattle Childrens (and thus >>> nowhere near my copy of my source code). But I also said I would look into >>> it for you later (and I have). Here is what my code tells me: So ever >>> since IPI shut down, we have been getting the PFAM and IPI data from >>> UniProt. There is a script in the UniProt.ws package >>> called processDataForBuild.R that is supposed to be called by the script >>> "src_build.sh" (it's the last thing that script does). That code should >>> get the pfam data from yeast for you. Please note that yeast required a >>> lot of special code to get it processed. Nothing with yeast annotations is >>> ever easy. It's like karmic accounting to compensate for all the bread and >>> beer. ;) >>> >>> Let me know if you need any more explanations about what is in there. >>> Because of the crazy timing, before I left I build I pushed into devel a >>> fresh set of .DB0s and core packages (in late August) just in case it was >>> too crazy to do a refresh right now. But it sounds like you won't need >>> that. >>> >>> >>> Marc >>> >>> >>> >>> On Sun, Oct 4, 2015 at 6:27 AM, James W. MacDonald <jmac...@uw.edu> >>> wrote: >>> >>>> I am building the annotation db0 packages for the upcoming Bioconductor >>>> release, which are used to generate all the orgDb and chip annotation >>>> packages that we distribute. Up to the previous release we have always >>>> included IPI identifiers (as part of the table containing the PROSITE >>>> and >>>> PFAM IDs). Unfortunately, IPI <https://www.ebi.ac.uk/IPI> is no longer >>>> maintained (since 2011), and UniProt, which is where we got data for the >>>> last few releases, has now dropped support as well. >>>> >>>> Given that this annotation source is no longer maintained, I decided to >>>> exclude these IDs from the current build of the following db0 packages: >>>> >>>> - rat.db0 >>>> - chicken.db0 >>>> - zebrafish.db0 >>>> - mouse.db0 >>>> - bovine.db0 >>>> - human.db0 >>>> >>>> In addition, it is not clear to me (nor can Marc recall) where the data >>>> for >>>> PFAM in the yeast.db0 package comes from. Given that we are pretty far >>>> behind schedule for these packages, I have excluded that table as well. >>>> >>>> If this will break anybody's package, or if there are people who rely on >>>> these IDs, I can just parse out of the last release and deprecate, so >>>> you >>>> will have the IDs for one more release. However, if nobody cares about >>>> such >>>> things, I will just go with what we have. Please speak up if this will >>>> affect you. >>>> >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>> >>> >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel