Dear R-Help group: I have been tinkering with how I want my personal standard library functions to look like. They are not designed to be professional and heavyweight, but lightweight. There are probably dozens of little bugs, because I don't know or have not properly taken care of a variety of internal R code issues. still, I like how this ended up, and there is no learning curve, so I thought I would share it.
I have put all my functions into a directory ~/src/iaw/R/ . In my standard .Rprofile, I thus first added a list of my libraries (well, I have just one) and invoke it: options(strict="very") Libdirs <- c("~/src/iaw/R/") for (libdir in Libdirs) { source( Sys.glob(paste0(libdir, "Rprofile"))) ) } (I prefer mnemonics to numbers. it is 2013. why is it text(..., pos="1"), when it should be text(...,pos="east")? "East", "E" should be abbreviations for it. Then we could also use NNW...but I am getting distracted...) in each of my "light" libraries, I now have an Rprofile file that looks like this library(compiler) cached <- paste0(libdir, "/library.Rdata") if (file.exists(cached) & all( file.info(cached)$mtime > file.info(Sys.glob(paste0(libdir,"/*.R")))$mtime ) ) { load(cached) cat("Loaded", cached, "\n") } else { Rprofile <- Sys.glob(paste0(libdir, "*.R")) for (n in Rprofile) { source(n) } ## libraries that I need to have in order to be able to compile library(utils) library(parallel) library(stats) library(graphics) library(grDevices) for (n in ls()) { (is.function(.GlobalEnv[[n]]))%or% next ##if ((n %in% c("n", "Rprofile", "cached"))) next cat("['", n, "']\n", sep="") .GlobalEnv[[n]] <- cmpfun(.GlobalEnv[[n]]) } save.image(file=cached) cat("Saved", cached, "\n") } the basic organizational idea now is to stick each R function into its own .R file. the Rprofile code makes sure that whenever I change a function in the ~src/iaw/*.R directory, the library is rebuilt (all functions are recompiled and then saved into an .Rdata file). this is very fast in my case; if it were not, I could add some intelligence. all of this could/should be stuck into a universal function ("library.light(directoryname)"), but because I only have one library for now, it can just live in the Rprofile. with this organization, it is now also easy to keep vignettes, latex text, other code, etc., in the same directory. they will just be ignored because they do not end with .R. the actual functions follow a format that is different from existing documentation systems, incl Hadley's oxygen, but designed to plug in (eventually) into the standard R manual and help system. PREAMBLE <- c( doc = ' @TITLE lagseries @AUTHOR ivo.we...@gmail.com @DATE Feb 25, 2013 @DESCRIPTION "lagseries" takes a vector and shift its contents numlags items to the left, filling in appropriate missing values to retain the length of the vector. If panelid is named, then lagged value from another panelid will not be assigned to be the lag. (Usually, the panelid will be the firm id, and the panel must be sorted by firmid. Naturally, it makes little sense to use this unless the observations are also sorted by the time of the observation. This is, after all, a lagseries function.) @USAGE lagseries( seriesin, numlags =1, panelid=NULL) @ARGUMENTS seriesin: a numeric vector numlags: an integer, can be negative panelid: an optional panel id @DETAILS None @SEEALSO leadseries, chgseries, pchgseries, compoundseries @EXAMPLES x <- rnorm(10) xlag <- lagseries(x,2) lm( x ~ xlag ) d <- data.frame( x <- c( rnorm(20), runif(30), rcauchy(40) ), who= c( rep("firm1",20), rep("firm2", 30), rep("firm3",40)), year= c( 1961:1980, 1971:2000, 1971:2010 ) ) lagd <- data.frame( x=lagseries(d$x, panelid=who), who=d$who, year=lagseries(d$x, panelid=who) ) ', test = ' all( lagseries( 1:6, 2, c(1,1,2,2,2,2) ) == c(NA,NA,NA,NA,3,4), na.rm=TRUE ) ') ################################################################################################################################ lagseries <- function (seriesin, numlags = 1, panelid = NULL) { if (!is.null(getOption("strict"))) { (is.null(seriesin)) %and% "Looks like you are trying to calc a lagseries from a NULL or non-existing series" (is.vector(seriesin, mode="any")) %or% "Your series is not a vector, but a {{class(seriesin)}}." (length(seriesin) > 1) %or% "Need more observations than {{length(seriesin)}}" (is.vector(numlags, mode="numeric", length=1)) %or% "numlags must be a simple integer, not {{numlags}}." (is.null(panelid) | (is.vector(seriesin, mode="any", length=length(seriesin)))) %or% "panel id must be NULL or a vector of same length as seriesin. right now it is {{class(panelid)}}" } (numlags == 0) %and% return(seriesin) if (numlags < 0) return(leadseries(seriesin, -numlags, panelid)) rv <- c(rep(NA, numlags), seriesin[1:(length(seriesin) - numlags)]) if (is.null(panelid)) return(rv) (all(panelid >= lagseries(panelid), na.rm = TRUE)) %or% "The panel is not sorted upwards by panel id" ifelse(panelid != lagseries(panelid, numlags), NA, rv) } I think this code looks nicer than Hadley Wickham's way of marking up the docs, but this is obviously a matter of taste. Unlike me, Hadley knows what he is doing. Still, I wouldn't mind if Hadley adopted a second optional format like this in oxygen3. Writing an R function that parses this preamble for every .R file should be easy. Writing an ESS parser with some more intelligence to understand that any R file that begins with preamble <- c(doc='') is documentation is probably doable as well, but emacs hacking is way beyond me. Writing code to step through the tests is also easy. Moreover, the preamble could hold other useful info (such as a minimum version number) if need be. note that, in my lagseries function, I am trying to be anal about the input checking, but I am careless about output checking. I wish getOption("strict") would also enable internal R checking, too, but c'est la vis. in case someone wants to try this out, here is the sketch of my backup routines ## if someone from r-help wants to try it out, here are some background routines: original.is.vector <- is.vector is.vector <- function( x, mode ="any", length =(-1) ) { (original.is.vector(x, mode=mode)) %or% return(FALSE) ((length<0) | (length(x)==length)) %or% return(FALSE) TRUE } ## abort.estring should: ## [a] add the name of the preceding invoking function at the start of the error message, preferably with source line number ## so, the user would see an error like ## * lagseries:52:: Need more observations than 1* ## [b] evaluate every {{ }} construct and insert output into the string ## [c] abort "%or%" <- function (e1, e2) { if (!e1) { if (is.character(e2)) abort.estring(e2) else eval(e2) } } I do not know whether it is possible to build an abort.estring function that does what I want, but R seems flexible enough to do almost anything. I have a sketch of [b], thanks to Neal Fultz, but not of [a]. I hope this organizational design helps some others. regards, /iaw ---- Ivo Welch (ivo.we...@gmail.com) http://www.ivo-welch.info/ <http://www.ivo-welch.info/> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.