Rainer M Krug <rai...@krugs.de> writes: > Andreas Leha <andreas.l...@med.uni-goettingen.de> writes: > >> Hi Rainer, > > Hi Andreas, > >> >> Rainer M Krug <rai...@krugs.de> writes: >>> "Charles C. Berry" <ccbe...@ucsd.edu> writes: >>> >>>> On Wed, 17 Jun 2015, William Denton wrote: >>>> >>>>> On 17 June 2015, Xebar Saram wrote: >>>>> >>>>>> I do alot of modeling work that involves using huge datasets and run >>>>>> process intensive R processes (such as complex mixed models, Gamms etc). >>>>>> in >>>>>> R studio all works well yet when i use the orgmode eval on R code blocks >>>>>> it >>>>>> works well for small simple process but 90% of the time when dealing with >>>>>> complex models and bug data (up to 256GB) it will just freeze emacs/ess. >>>>>> sometimes i can C-c or C-g it and other times i need to physically kill >>>>>> emacs. >>>>> >>>>> I've been having the same problem for a while, but wasn't able to >>>>> isolate it any more than large data sets, lack of memory, and heavy >>>>> CPU usage. Sometimes everything hangs and I need to power cycle the >>>>> computer. :( >>>>> >>>> >>>> And you (both) have `ess-eval-visibly' set to nil, right? >>>> >>>> I do statistical genomics, which can be compute intensive. Sometimes >>>> processes need to run for a while, and I get impatient having to wait. >>>> >>>> I wrote (and use) ox-ravel[1] to speed up my write-run-revise cycle in >>>> org-mode. >>>> >>>> Basically, ravel will export Org mode to a format that knitr (and the >>>> like) can run - turning src blocks into `code chunks'. That allows me >>>> to set the cache=TRUE chunk option, etc. I run knitr on the exported >>>> document to initialize objects for long running computations or to >>>> produce a finished report. >>>> >>>> When I start a session, I run knitr in the R session, then all the >>>> cached objects are loaded in and ready to use. >>>> >>>> If I write a src block I know will take a long time to export, I >>>> export from org mode to update the knitr document and re-knit it to >>>> refresh the cache. >>> >>> I have a similar workflow, only that I use a package like >>> approach, i.e. I tangle function definitions in a folder ./R, data into >>> ./data (which makes it possible to share org defined variables with R >>> running outside org) and scripts, i.e. the things which do a analysis, >>> import data, ... i.e. which might take long, into a folder ./scripts/. I >>> then add the usual R package infrastructure files (DESCRIPTION, >>> NAMESPACE, ...). >>> Then I have one file tangled into ./scripts/init.R: >>> >>> #+begin_src R :tangle ./scripts/init.R >>> library(devtools) >>> load_all() >>> #+end_src >>> >>> >>> and one for the analysis: >>> >>> #+begin_src R :tangle ./scripts/myAnalysis.R >>> ## Do some really time intensive and horribly complicated and important >>> ## stuff here >>> save( >>> fileNames, >>> bw, >>> cols, >>> labels, >>> fit, >>> dens, >>> gof, >>> gofPerProf, >>> file = "./cache/results.myAnalysis.rds" >>> ) >>> #+end_src >>> >>> >>> Now after tangling, I have my code easily available in a new R session: >>> >>> 1) start R in the directory in which the DESCRIPTION file is, >>> 2) run source("./scripts/init.R") >>> >>> and I have all my functions and data available. >>> >>> To run a analysis, I do >>> >>> 3) source("./scripts/myAnalysis.R") >>> >>> and the results are saved in a file fn >>> >>> To analyse the data further, I can then simply use >>> >>> #+begin_src R :tangle ./scripts/myAnalysis.R >>> fitSing <- attach("./cache/results.myAnalysis.rds") >>> #+end_src >>> >>> >>> so they won't interfere with my environment in R. >>> >>> I can finally remove the attached environment by doing >>> >>> #+begin_src R :tangle ./scripts/myAnalysis.R >>> detach( >>> name = attr(fitSing, "name"), >>> character.only = TRUE >>> ) >>> #+end_src >>> >>> Through these caching and compartmentalizing, I can easily do some >>> things outside org and some inside, and easily combine all the data. >>> >>> Further advantage: I can actually create the package and send it to >>> somebody for testing and review and it should run out of the box, as in >>> the DESCRIPTION file all dependencies are defined. >>> >>> I am using this approach at the moment for a paper and which will also >>> result in a paper. By executing all the scripts, one will be able to do >>> import the raw data, do the analysis and create all graphs used in the >>> paper. >>> >>> Hope this gives you another idea how one can handle long running >>> analysis in R in org, >>> >>> Cheers, >>> >>> Rainer >>> >> >> That is a cool workflow. I especially like the fact that you end up >> with an R package. > > Thanks. Yes - the idea of having a package at the end was one main > reason why I am using this approach. > > >> >> So, I'll try my again. Is there there any chance to see working >> example of this? I'd love to see that. > > Let's say I am working on it. I am working on a project which is using > this workflow and when it is finished, the package will be available as > an electronic appendix to the paper. > > But I will see if I can condense an example and blog it - I'll let you > kow when it is done. >
Thanks! Either way, I am really looking forward to this. Regards, Andreas