Re: Spark-Streaming collect/take functionality.

2014-08-26 Thread Chris Fregly
good suggestion, td. and i believe the optimization that jon.burns is referring to - from the big data mini course - is a step earlier: the sorting mechanism that produces sortedCounts. you can use mapPartitions() to get a top k locally on each partition, then shuffle only (k * # of partitions)

Re: Spark-Streaming collect/take functionality.

2014-07-15 Thread jon.burns
It works perfect, thanks!. I feel like I should have figured that out, I'll chalk it up to inexperience with Scala. Thanks again. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-collect-take-functionality-tp9670p9772.html Sent from the Apach

Re: Spark-Streaming collect/take functionality.

2014-07-14 Thread Tathagata Das
Why doesnt something like this work? If you want a continuously updated reference to the top counts, you can use a global variable. var topCounts: Array[(String, Int)] = null sortedCounts.foreachRDD (rdd => val currentTopCounts = rdd.take(10) // print currentTopCounts it or watever top