Re: combineByKey at ShuffledDStream.scala

2014-07-23 Thread Bill Jay
Spark Streaming program, which consumes data >> from Kakfa and does the group by operation on the data. I try to optimize >> the running time of the program because it looks slow to me. It seems the >> stage named: >> >> * combineByKey at ShuffledDStream.scala:42 * >>

Re: combineByKey at ShuffledDStream.scala

2014-07-22 Thread Tathagata Das
operation on the data. I try to optimize the > running time of the program because it looks slow to me. It seems the stage > named: > > * combineByKey at ShuffledDStream.scala:42 * > > always takes the longest running time. And If I open this stage, I only > see two executors on th

combineByKey at ShuffledDStream.scala

2014-07-22 Thread Bill Jay
Hi all, I am currently running a Spark Streaming program, which consumes data from Kakfa and does the group by operation on the data. I try to optimize the running time of the program because it looks slow to me. It seems the stage named: * combineByKey at ShuffledDStream.scala:42 * always