I’m wondering which is the most efficient (time, than memory usage) way to 
obtain a multivariate time series object from a data frame (the easiest data 
structure to get data from a database trough RODBC).
I have a starting point using timeSeries or xts library (these libraries can 
handle time zones), below you can find code to test.
Merging parallelization (cbind) is something I’m thinking at (suggestions from 
users with experience on this topic is highly appreciated), any suggestion is 
welcome.
My platform is Windows XP, R 2.12.1, latest available packages on CRAN for 
timeSeries and xts.


set.seed(123)

N <- 9000
X <- data.frame(
  ID = c(rep(1,N), rep(2,N,), rep(3,N), rep(4,N)),
  DATE = rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), 4),
  VALUE = runif(N*4))

library(timeSeries)
buildTimeSeriesFromDataFrame <- function(x, env)
{
  {
    if(exists("xx", envir = env))
      assign("xx",
        cbind(get("xx", env), timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d 
%H:%M:%S',
          zone = 'GMT', units = as.character(x$ID[1]))),
        envir = env)
    else
      assign("xx",
        timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S',
          zone = 'GMT', units = as.character(x$ID[1])),
        envir = env)

    return(TRUE)
  }
}


fooBy <- function(...)
{
  e1 <- new.env(parent = baseenv())
  res <- by(X, X$ID, buildTimeSeriesFromDataFrame,
      env = e1, simplify = TRUE)
  return(get("xx", e1))
}

Time01 <- replicate(100,
  system.time(fooBy(X,
    X$ID, buildTimeSeriesFromDataFrame,
    simplify = TRUE))[[1]])

median(Time01)
hist(Time01)

library(xts)

buildXtsFromDataFrame <- function(x, env)
{
  {
    if(exists("xx", envir = env))
      assign("xx",
        cbind(get("xx", env), xts(x$VALUE,
          as.POSIXct(x$DATE, format = '%Y-%m-%d %H:%M:%S'),
          tzone = 'GMT')),
        envir = env)
    else
      assign("xx",
        xts(x$VALUE, as.POSIXct(x$DATE, format = '%Y-%m-%d %H:%M:%S'),
          tzone = 'GMT'),
        envir = env)

    return(TRUE)
  }
}

fooBy <- function(...)
{
  e1 <- new.env(parent = baseenv())
  res <- by(X, X$ID, buildXtsFromDataFrame,
      env = e1, simplify = TRUE)
  return(get("xx", e1))
}

Time02 <- replicate(100,
  system.time(fooBy(X,
    X$ID, buildTimeSeriesFromDataFrame,
    simplify = TRUE))[[1]])

median(Time02)
hist(Time02)

plot(density(Time02), xlim = c(min(c(Time02, Time01)), max(c(Time02, Time01))))
lines(density(Time01))


Best regards,
Daniele Amberti



ORS Srl

Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
Tel. +39 0173 620211
Fax. +39 0173 620299 / +39 0173 433111
Web Site www.ors.it

------------------------------------------------------------------------------------------------------------------------
Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi allegati è 
vietato e potrebbe costituire reato.
Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se 
provvedesse alla distruzione dello stesso
e degli eventuali allegati.
Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non 
siano relative alle attività e/o
alla missione aziendale di O.R.S. Srl si intendono non  attribuibili alla 
società stessa, né la impegnano in alcun modo.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to