Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-12 Thread p pathiyil
Thanks Sebastian. I was indeed trying out FAIR scheduling with a high value for concurrentJobs today. It does improve the latency seen by the non-hot partitions, even if it does not provide complete isolation. So it might be an acceptable middle ground. On 12 Feb 2016 12:18, "Sebastian Piu" wrot

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread Sebastian Piu
Have you tried using fair scheduler and queues On 12 Feb 2016 4:24 a.m., "p pathiyil" wrote: > With this setting, I can see that the next job is being executed before > the previous one is finished. However, the processing of the 'hot' > partition eventually hogs all the concurrent jobs. If there

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread p pathiyil
With this setting, I can see that the next job is being executed before the previous one is finished. However, the processing of the 'hot' partition eventually hogs all the concurrent jobs. If there was a way to restrict jobs to be one per partition, then this setting would provide the per-partitio

RE: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread Diwakar Dhanuskodi
Hi, Did  you try  another  implementation  of  DirectStream where you  give  only   topic. It would  read all  topic partitions in parallel  under a batch   interval . You  need  not create union explicitly.  Sent from Samsung Mobile. Original message From: p pathiyil Date:11

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread Cody Koeninger
The real way to fix this is by changing partitioning, so you don't have a hot partition. It would be better to do this at the time you're producing messages, but you can also do it with a shuffle / repartition during consuming. There is a setting to allow another batch to start in parallel, but t