Thank you for your reply. It is a Scala and Python library. Is similar library exists for Java?
On Wed, Dec 9, 2015 at 10:26 PM, Sean Owen <[email protected]> wrote: > CC Sandy as his https://github.com/cloudera/spark-timeseries might be > of use here. > > On Wed, Dec 9, 2015 at 4:54 PM, Arun Verma <[email protected]> > wrote: > > Hi all, > > > > We have RDD(main) of sorted time-series data. We want to split it into > > different RDDs according to window size and then perform some aggregation > > operation like max, min etc. over each RDD in parallel. > > > > If window size is w then ith RDD has data from (startTime + (i-1)*w) to > > (startTime + i*w) where startTime is time-stamp of 1st entry in main RDD > and > > (startTime + (i-1)*w) is greater then last entry of main RDD. > > > > For now, I am using DataFrame and Spark version 1.5.2. Below code is > running > > sequentially on the data, so execution time is high and resource > utilization > > is low. Code snippet is given below: > > /* > > * aggragator is max > > * df - Dataframe has sorted timeseries data > > * start - first entry of DataFrame > > * end - last entry of DataFrame df > > * bucketLengthSec - window size > > * stepResults - has particular block/window output(JSON) > > * appendResults - has output till this block/window(JSON) > > */ > > while (start <= end) { > > row = df.filter(df.col("timeStamp") > > .between(start, nextStart)) > > .agg(max(df.col("timeStamp")), max(df.col("value"))) > > .first(); > > if (row.get(0) != null) { > > stepResults = new JSONObject(); > > stepResults.put("x", Long.parseLong(row.get(0).toString())); > > stepResults.put("y", row.get(1)); > > appendResults.add(stepResults); > > } > > start = nextStart; > > nextStart = start + bucketLengthSec; > > } > > > > > > -- > > Thanks and Regards, > > Arun Verma > -- Thanks and Regards, Arun Verma
