Re: Spark streaming from Kafka best fit

2016-03-07 Thread pratik khadloya
Would using mapPartitions instead of map help here? ~Pratik On Tue, Mar 1, 2016 at 10:07 AM Cody Koeninger wrote: > You don't need an equal number of executor cores to partitions. An > executor can and will work on multiple partitions within a batch, one after > the other. The real issue is w

Re: Spark streaming from Kafka best fit

2016-03-01 Thread Cody Koeninger
You don't need an equal number of executor cores to partitions. An executor can and will work on multiple partitions within a batch, one after the other. The real issue is whether you are able to keep your processing time under your batch time, so that delay doesn't increase. On Tue, Mar 1, 2016

Re: Spark streaming from Kafka best fit

2016-03-01 Thread Jatin Kumar
Thanks Cody! I understand what you said and if I am correct it will be using 224 executor cores just for fetching + stage-1 processing of 224 partitions. I will obviously need more cores for processing further stages and fetching next batch. I will start with higher number of executor cores and s

Re: Spark streaming from Kafka best fit

2016-03-01 Thread Cody Koeninger
> "How do I keep a balance of executors which receive data from Kafka and which process data" I think you're misunderstanding how the direct stream works. The executor which receives data is also the executor which processes data, there aren't separate receivers. If it's a single stage worth of