Thank you for your reply. It is a Scala and Python library. Is similar
library exists for Java?

On Wed, Dec 9, 2015 at 10:26 PM, Sean Owen <[email protected]> wrote:

> CC Sandy as his https://github.com/cloudera/spark-timeseries might be
> of use here.
>
> On Wed, Dec 9, 2015 at 4:54 PM, Arun Verma <[email protected]>
> wrote:
> > Hi all,
> >
> > We have RDD(main) of sorted time-series data. We want to split it into
> > different RDDs according to window size and then perform some aggregation
> > operation like max, min etc. over each RDD in parallel.
> >
> > If window size is w then ith RDD has data from (startTime + (i-1)*w) to
> > (startTime + i*w) where startTime is time-stamp of 1st entry in main RDD
> and
> > (startTime + (i-1)*w) is greater then last entry of main RDD.
> >
> > For now, I am using DataFrame and Spark version 1.5.2. Below code is
> running
> > sequentially on the data, so execution time is high and resource
> utilization
> > is low. Code snippet is given below:
> > /*
> > * aggragator is max
> > * df - Dataframe has sorted timeseries data
> > * start - first entry of DataFrame
> > * end - last entry of DataFrame df
> > * bucketLengthSec - window size
> > * stepResults - has particular block/window output(JSON)
> > * appendResults - has output till this block/window(JSON)
> > */
> > while (start <= end) {
> >     row = df.filter(df.col("timeStamp")
> >             .between(start, nextStart))
> >             .agg(max(df.col("timeStamp")), max(df.col("value")))
> >             .first();
> >     if (row.get(0) != null) {
> >         stepResults = new JSONObject();
> >         stepResults.put("x", Long.parseLong(row.get(0).toString()));
> >         stepResults.put("y", row.get(1));
> >         appendResults.add(stepResults);
> >     }
> >     start = nextStart;
> >     nextStart = start + bucketLengthSec;
> > }
> >
> >
> > --
> > Thanks and Regards,
> > Arun Verma
>



-- 
Thanks and Regards,
Arun Verma

Reply via email to