Re: [R] how to convert "sloppy data" into a time series?

Dennis Murphy Fri, 17 Dec 2010 05:58:41 -0800

Hi:

As you mentioned at the outset, you have a very irregular time series, to
which David has given you one reasonable suggestion; perhaps another is the
zoo package. Those are the standard R packages to deal with irregular time
series. There may be others of which I am unaware, though - there may be
something in the Rmetrics suite that pertains, for example. Check the Time
Series view at CRAN for possible alternatives:
http://cran.r-project.org/web/views/

ARIMA modeling, OTOH, assumes that the data are equally spaced and
stationary (perhaps after suitable differencing or detrending).
Consequently, I think you may need to rethink your strategy for modeling
these data. One possibility is to aggregate the data appropriately, but
you're the one who has to decide what would be the appropriate interval of
aggregation and what difficulties might ensue (e.g., unequal sample sizes
per time interval). This is not a simple problem, and the best strategy may
be to start with description and gradually work your way to a reasonable,
scientifically plausible model.

A sensible question to ask is: what is the largest time unit I can use
without losing vital information? That might be a place to start...

Trying to model a time series with very large time gaps is a little like
having several stills from a movie and trying to reconstruct the movie and
its plot without having seen it beforehand. You'll need to use every bit of
knowledge you have about the underlying process to aid in the analysis.

HTH,
Dennis

On Thu, Dec 16, 2010 at 5:35 PM, Mike Williamson <this.is....@gmail.com>wrote:

> Hi All,
>
>    First let me state that I did search for a while on r-help, google, and
> using the "sos" package inside of 'R', without much luck.  I want to know
> how to create a univariate time series from a set of data that will have
> huge time gaps in it.  For instance, here is a snapshot of a piece of data
> that I would like to analyze:
>
> *Row             queued_time       processTime
> 50  2010-06-15 21:50:42.443 6.399989e-02 secs
> 63  2010-06-15 21:51:57.347 6.300020e-02 secs
> 156 2010-06-29 14:53:26.073 3.011863e+06 secs
> 175 2010-07-22 10:14:57.503 4.334879e+06 secs
> 278 2010-08-05 11:29:56.713 6.155674e+06 secs
> 509 2010-08-05 11:29:57.443 3.120779e+06 secs
> 531 2010-08-05 11:29:57.543 3.120779e+06 secs
> 555 2010-08-05 11:29:57.647 3.120779e+06 secs
> 190 2010-08-05 11:29:57.943 3.120778e+06 secs
> 230 2010-08-05 11:29:58.047 3.120778e+06 secs
> 211 2010-08-05 11:29:58.917 3.120777e+06 secs
> 251 2010-08-05 11:29:59.077 3.120777e+06 secs
> 298 2010-08-05 11:29:59.297 3.120777e+06 secs
> 320 2010-08-05 11:29:59.397 3.120777e+06 secs
> 366 2010-08-05 11:29:59.707 3.120777e+06 secs
> 342 2010-08-05 11:30:00.987 3.120775e+06 secs
> 380 2010-08-05 11:30:01.200 3.120775e+06 secs
> 120 2010-08-19 09:31:47.207 2.358866e+06 secs
> 141 2010-08-19 09:31:47.500 2.358866e+06 secs
> 842 2010-09-03 13:58:21.463 3.641194e+06 secs
> *
>    I would like to be able to take the second column, the "processTime",
> and put it into a time series using the first column as the key to say when
> it occurred.  But everything I could find, such as ts(), went on the
> assumption that I had fully univariate data to start with, and all I needed
> to do was set the frequency & start date (in the case of ts() ).
>    I can adjust the "queued time" arbitrarily as needed, so that if, for
> instance, the data set would end up far too sparse & empty by keeping the
> current precision, I could cut the "queued_time" precision down to just the
> year, month, day, hour.  But in that case, how would the time series handle
> the fact that there are several (varying) entries with the same value
> stored.
>
>    The reason I want to do this is because I next want to be able to use
> all the very nice modeling capabilities that a univariate time series
> allows, such as arima, etc.
>
>                                                Thanks in advance!
>                                                            Mike
>
>
>
>
>
>
> "Telescopes and bathyscaphes and sonar probes of Scottish lakes,
> Tacoma Narrows bridge collapse explained with abstract phase-space maps,
> Some x-ray slides, a music score, Minard's Napoleanic war:
> The most exciting frontier is charting what's already here."
>  -- xkcd
>
> --
> Help protect Wikipedia. Donate now:
> http://wikimediafoundation.org/wiki/Support_Wikipedia/en
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to convert "sloppy data" into a time series?

Reply via email to