That's a bit better, but you're still creating an object in the global environment, when you should be returning it from your function.
Hadley On Mon, Mar 14, 2011 at 8:54 AM, Daniele Amberti <daniele.ambe...@ors.it> wrote: > Thanks Hadley for Your interest, below some code without environments use > (using timeSeries); I also made some experiments with .parallel = TRUE in > daply to crate timeSeries objects and then bind them together but I have some > problems. > > Thank You in advance, > Daniele Amberti > > set.seed(123) > N <- 10000 > X <- data.frame( > ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)), > DATE = as.character(rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), 4)), > VALUE = runif(N*4), stringsAsFactors = FALSE) > X <- X[sample(1:(N*4), N*4),] > str(X) > head(X) > > #define a variable in global env > ATS <- NULL > > buildTimeSeriesFromDataFrame <- function(x) > { > library(timeSeries) > if(!is.null(ATS)) # in global env > { > # assign in global env > ATS <<- cbind(ATS, > timeSeries(x$VALUE, x$DATE, > format = '%Y-%m-%d %H:%M:%S', > zone = 'GMT', units = as.character(x$ID[1]))) > } else > { > # assign in global env > ATS <<- timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S', > zone = 'GMT', units = as.character(x$ID[1])) > } > return(TRUE) > } > > tsDaply <- function(...) > { > # assign in global env, to clean previous run > ATS <<- NULL > library(plyr) > res <- daply(X, "ID", buildTimeSeriesFromDataFrame) > return(res) > } > > tsDaply(X, X$ID) > head(ATS) > > #performance tests > Time <- replicate(100, > system.time(tsDaply(X, X$ID))[[1]]) > median(Time) > hist(Time) > > ### > #some multithread tests: > ### > > library(doSMP) > w <- startWorkers(workerCount = 2) > registerDoSMP(w) > > # do not cbint ts, just create > buildTimeSeriesFromDataFrame2 <- function(x) > { > library(timeSeries ) > xx <- timeSeries:::timeSeries(x$VALUE, x$DATE, > format = '%Y-%m-%d %H:%M:%S', > zone = 'GMT', units = as.character(x$ID[1])) > return(xx) > } > > #tsDaply2 <- function(...) > #{ > # library(plyr) > # res <- daply(X, "ID", buildTimeSeriesFromDataFrame2, .parallel = TRUE) > # return(res) > #} > > # tsDaply2 .parallel = TRUE return error: > #Error in do.ply(i) : task 4 failed - "subscript out of bounds" > #In addition: Warning messages: > #1: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)' > #2: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)' > > > tsDaply2 <- function(...) > { > library(plyr) > res <- daply(X, "ID", buildTimeSeriesFromDataFrame2, .parallel = FALSE) > return(res) > } > # tsDaply2 .parallel = FALSE work but list discart timeSeries class > > # bind after ts creation > res <- tsDaply2(X, X$ID) > # list is not a timeSeries object > str(cbind(t(res))) > res <- as.timeSeries(cbind(t(res))) > > stopWorkers(w) > > > -----Original Message----- > From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On Behalf Of Hadley > Wickham > Sent: 14 March 2011 12:48 > To: Daniele Amberti > Cc: r-help@r-project.org > Subject: Re: [R] dataframe to a timeseries object - [ ] Message is from an > unknown sender > > Well, I'd start by removing all explicit use of environments, which > makes you code very hard to follow. > > Hadley > > On Monday, March 14, 2011, Daniele Amberti <daniele.ambe...@ors.it> wrote: >> I found that plyr:::daply is more efficient than base:::by (am I doing >> something wrong?), below updated code for comparison (I also fixed a couple >> things). >> Function daply from plyr package has also a .parallel argument and I wonder >> if creating timeseries objects in parallel and then combining them would be >> faster (Windows XP platform); does someone has experience with this topic? I >> found only very simple examples about plyr and parallel computations and I >> do not have a working example for such kind of implementation (daply that >> return a list of timeseries objects). >> >> Thanks in advance, >> Daniele Amberti >> >> >> set.seed(123) >> >> N <- 10000 >> X <- data.frame( >> ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)), >> DATE = as.character(rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), 4)), >> VALUE = runif(N*4), stringsAsFactors = FALSE) >> X <- X[sample(1:(N*4), N*4),] >> str(X) >> >> library(timeSeries) >> buildTimeSeriesFromDataFrame <- function(x, env) >> { >> { >> if(exists("xx", envir = env)) >> assign("xx", >> cbind(get("xx", env), timeSeries(x$VALUE, x$DATE, >> format = '%Y-%m-%d %H:%M:%S', >> zone = 'GMT', units = as.character(x$ID[1]))), >> envir = env) >> else >> assign("xx", >> timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S', >> zone = 'GMT', units = as.character(x$ID[1])), >> envir = env) >> >> return(TRUE) >> } >> } >> >> tsBy <- function(...) >> { >> e1 <- new.env(parent = baseenv()) >> res <- by(X, X$ID, buildTimeSeriesFromDataFrame, >> env = e1, simplify = TRUE) >> return(get("xx", e1)) >> } >> >> Time01 <- replicate(100, >> system.time(tsBy(X, X$ID, simplify = TRUE))[[1]]) >> median(Time01) >> hist(Time01) >> ATS <- tsBy(X, X$ID, simplify = TRUE) >> >> >> library(xts) >> buildXtsFromDataFrame <- function(x, env) >> { >> { >> if(exists("xx", envir = env)) >> assign("xx", >> cbind(get("xx", env), xts(x$VALUE, >> as.POSIXct(x$DATE, tz = "GMT", >> format = '%Y-%m-%d %H:%M:%S'), >> tzone = 'GMT')), >> envir = env) >> else >> assign("xx", >> xts(x$VALUE, as.POSIXct(x$DATE, tz = "GMT", >> format = '%Y-%m-%d %H:%M:%S'), >> tzone = 'GMT'), >> envir = env) >> >> return(TRUE) >> } >> } >> >> xtsBy <- function(...) >> { >> e1 <- new.env(parent = baseenv()) >> res <- by(X, X$ID, buildXtsFromDataFrame, >> env = e1, simplify = TRUE) >> return(get("xx", e1)) >> } >> >> Time02 <- replicate(100, >> system.time(xtsBy(X, X$ID,simplify = TRUE))[[1]]) >> median(Time02) >> hist(Time02) >> AXTS <- xtsBy(X, X$ID, simplify = TRUE) >> >> plot(density(Time02), col = "red", >> xlim = c(min(c(Time02, Time01)), max(c(Time02, Time01)))) >> lines(density(Time01), col = "blue") >> #check equal, a still a problem with names >> AXTS2 <- as.timeSeries(AXTS) >> names(AXTS2) <- names(ATS) >> identical(getDataPart(ATS), getDataPart(AXTS2)) >> identical(time(ATS), time(AXTS2)) >> >> # with plyr library and daply instead of by: >> library(plyr) >> >> tsDaply <- function(...) >> { >> e1 <- new.env(parent = baseenv()) >> res <- daply(X, "ID", buildTimeSeriesFromDataFrame, >> env = e1) >> return(get("xx", e1)) >> } >> >> Time03 <- replicate(100, >> system.time(tsDaply(X, X$ID))[[1]]) >> median(Time03) >> hist(Time03) >> >> xtsDaply <- function(...) >> { >> e1 <- new.env(parent = baseenv()) >> res <- daply(X, "ID", buildXtsFromDataFrame, >> env = e1) >> return(get("xx", e1)) >> } >> >> Time04 <- replicate(100, >> system.time(xtsDaply(X, X$ID))[[1]]) >> >> median(Time04) >> hist(Time04) >> >> plot(density(Time04), col = "red", >> xlim = c( >> min(c(Time02, Time01, Time03, Time04)), >> max(c(Time02, Time01, Time03, Time04))), >> ylim = c(0,100)) >> lines(density(Time03), col = "blue") >> lines(density(Time02)) >> lines(density(Time01)) >> >> >> >> >> >> -----Original Message----- >> From: Daniele Amberti >> Sent: 11 March 2011 14:44 >> To: r-help@r-project.org >> Subject: dataframe to a timeseries object >> >> I'm wondering which is the most efficient (time, than memory usage) way to >> obtain a multivariate time series object from a data frame (the easiest data >> structure to get data from a database trough RODBC). >> I have a starting point using timeSeries or xts library (these libraries can >> handle time zones), below you can find code to test. >> Merging parallelization (cbind) is something I'm thinking at (suggestions >> from users with experience on this topic is highly appreciated), any >> suggestion is welcome. >> My platform is Windows XP, R 2.12.1, latest available packages on CRAN for >> timeSeries and xts. >> >> >> set.seed(123) >> >> N <- 9000 >> X <- data.frame( >> ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)), >> DATE = rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), 4), >> VALUE = runif(N*4)) >> >> library(timeSeries) >> buildTimeSeriesFromDataFrame <- function(x, env) >> { >> { >> if(exists("xx", envir = env)) >> assign("xx", >> cbind(get("xx", env), timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d >> %H:%M:%S', >> zone = 'GMT', units = as.character(x$ID[1]))), >> envir = env) >> else >> assign("xx", >> timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S', >> zone = 'GMT', units = as.character(x$ID[1])), >> envir = env) >> >> return(TRUE) >> } >> } >> >> >> fooBy <- function(...) >> { >> e1 <- new.env(parent = baseenv()) >> res <- by(X, X$ID, buildTimeSeriesFromDataFrame, >> env = e1, simplify = TRUE) >> return(get("xx", e1)) >> } >> >> Time01 <- replicate(100, >> system.time(fooBy(X, >> X$ID, buildTimeSeriesFromDataFrame, >> simplify = TRUE))[[1]]) >> >> median(Time01) >> hist(Time01) >> >> library(xts) >> >> buildXtsFromDataFrame <- function(x, env) >> { >> { >> if(exists("xx", envir = env)) >> assign("xx", >> cbind(get("xx", env), xts(x$VALUE, >> as.POSIXct(x$DATE, format = '%Y-%m-%d %H:%M:%S'), >> tzone = 'GMT')), >> envir = env) >> else >> assign("xx", >> xts(x$VALUE, as.POSIXct(x$DATE, format = '%Y-%m-%d %H:%M:%S'), >> tzone = 'GMT'), >> envir = env) >> >> return(TRUE) >> } >> } >> >> fooBy <- function(...) >> { >> e1 <- new.env(parent = baseenv()) >> res <- by(X, X$ID, buildXtsFromDataFrame, >> env = e1, simplify = TRUE) >> return(get("xx", e1)) >> } >> >> Time02 <- replicate(100, >> system.time(fooBy(X, >> X$ID, buildTimeSeriesFromDataFrame, >> simplify = TRUE))[[1]]) >> >> median(Time02) >> hist(Time02) >> >> plot(density(Time02), xlim = c(min(c(Time02, Time01)), max(c(Time02, >> Time01)))) >> lines(density(Time01)) >> >> >> Best regards, >> Daniele Amberti >> >> ORS Srl >> >> Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy >> Tel. +39 0173 620211 >> Fax. +39 0173 620299 / +39 0173 433111 >> Web Site www.ors.it >> >> ------------------------------------------------------------------------------------------------------------------------ >> Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi >> allegati è vietato e potrebbe costituire reato. >> Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se >> provvedesse alla distruzione dello stesso >> e degli eventuali allegati. >> Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non >> siano relative alle attività e/o >> alla missione aziendale di O.R.S. Srl si intendono non attribuibili alla >> società stessa, né la impegnano in alcun modo. >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > / > > ORS Srl > > Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy > Tel. +39 0173 620211 > Fax. +39 0173 620299 / +39 0173 433111 > Web Site www.ors.it > > ------------------------------------------------------------------------------------------------------------------------ > Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi allegati > è vietato e potrebbe costituire reato. > Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se > provvedesse alla distruzione dello stesso > e degli eventuali allegati. > Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non > siano relative alle attività e/o > alla missione aziendale di O.R.S. Srl si intendono non attribuibili alla > società stessa, né la impegnano in alcun modo. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.