Hi,
How can I measure the time an RDD takes to execute?
In particular, I want to do it for the following piece of code:
«
val ssc = new StreamingContext(sparkConf, Seconds(5))
val distFile = ssc.textFileStream("/home/myuser/twitter-dump")
val words = distFile.flatMap(_.split(" ")).filter(_.length > 3)
val wordCharValues = words.map(word => {
var sum = 0
word.toCharArray.foreach(sum += _.toInt)
val value = sum.toDouble / word.length.toDouble
val average = 1892.162961
(math.pow(value - average, 2), 1)
})
.reduceByWindow({ case ((sum1, count1), (sum2, count2)) => (sum1 +
sum2, count1 + count2)}, Seconds(10), Seconds(10))
wordCharValues.foreachRDD(rdd => {
val result = rdd.take(1)
println("Result array size: " + result.size)
if(result.size > 0)
println("STDEV: %f".format(math.sqrt(result(0)._1.toDouble /
result(0)._2.toDouble)))
})
»
Thanks.