Hi, It seems that previously I understood reduceByWindow wrongly. But now for me reduceByWindow means that after this operation all the elements in each window reduces to one RDD. In that case, the code will be as fallows:
val individualpoints = ssc.socketTextStream(args(1), args(2).toInt, StorageLevel.MEMORY_AND_DISK_SER) //Assume there are no keys just values val squareandcount = individualpoints.map(x => (math.pow(x.toDouble, 2), x.toDouble, 1)).reduceByWindow((a, b) => (a._1 + b._1, a._2 + b._2, a._3 + b._3), Seconds(2), Seconds(2)) squareandcount.foreachRDD(rdd => { rdd.collect.foreach({ case (sumofsquare, sum,n) => { println("********************************************") val mean = sum/n println("The mean is --> " + mean) val varience = sumofsquare/n - math.pow(mean,2) println("The varience is --> " + varience) println("The SD is --> " + math.sqrt(varience)) println("*********************************************") } }) }) Please correct me if I have understood something wrong. Also I think that Spark output operations are somewhat limited specially with streaming. Its always easier to understand the code after looking the output. Both outputs either on console or saveAsTextFile are not user friendly. Regards, Laeeq On Wednesday, May 21, 2014 8:06 PM, Sean Owen <so...@cloudera.com> wrote: Are you not just looking for the window() function that creates the sliding-window RDDs in the first place? That DStreams' RDDs give you all elements in the sliding window, and you can compute a mean or variance as you like. You should be able to do this quite efficiently without recomputing each time by using reduceByWindow and a running mean / stdev formula. On Wed, May 21, 2014 at 1:42 PM, Laeeq Ahmed <laeeqsp...@yahoo.com> wrote: > Hi, > > I want to do union of all RDDs in each window of DStream. I found > Dstream.union and haven't seen anything like DStream.windowRDDUnion. > > Is there any way around it? > > I want to find mean and SD of all values which comes under each sliding > window for which I need to union all the RDDs in each window. This is not a > running mean and SD. > > Regards, > Laeeq >