[spark-streaming] can shuffle write to disk be disabled?

2015-03-17 Thread Darren Hoo
I use spark-streaming reading messages from a Kafka, the producer creates messages about 1500 per second def hash(x: String): Int = { MurmurHash3.stringHash(x) } val stream = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap, StorageLevel.MEMORY_ONLY_SER).map(_._2

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo
sliding window is just 3 seconds, so you will > process each 60 second's data in 3 seconds, if processing latency is larger > than the sliding window, so maybe you computation power cannot reach to the > qps you wanted. > > > > I think you need to identify the bottleneck

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo
Thanks, Shao On Wed, Mar 18, 2015 at 3:34 PM, Shao, Saisai wrote: > Yeah, as I said your job processing time is much larger than the sliding > window, and streaming job is executed one by one in sequence, so the next > job will wait until the first job is finished, so the total latency will be

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo
I've already done that: >From SparkUI Environment Spark properties has: spark.shuffle.spillfalse On Wed, Mar 18, 2015 at 6:34 PM, Akhil Das wrote: > I think you can disable it with spark.shuffle.spill=false > > Thanks > Best Regards > > On Wed, Mar 18, 2015 at

Re: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Darren Hoo
On Wed, Mar 18, 2015 at 8:31 PM, Shao, Saisai wrote: > From the log you pasted I think this (-rw-r--r-- 1 root root 80K Mar > 18 16:54 shuffle_47_519_0.data) is not shuffle spilled data, but the > final shuffle result. > why the shuffle result is written to disk? > As I said, did you think

can distinct transform applied on DStream?

2015-03-20 Thread Darren Hoo
val aDstream = ... val distinctStream = aDstream.transform(_.distinct()) but the elements in distinctStream are not distinct. Did I use it wrong?