Would using mapPartitions instead of map help here?
~Pratik
On Tue, Mar 1, 2016 at 10:07 AM Cody Koeninger wrote:
> You don't need an equal number of executor cores to partitions. An
> executor can and will work on multiple partitions within a batch, one after
> the other. The real issue is w
You don't need an equal number of executor cores to partitions. An
executor can and will work on multiple partitions within a batch, one after
the other. The real issue is whether you are able to keep your processing
time under your batch time, so that delay doesn't increase.
On Tue, Mar 1, 2016
Thanks Cody!
I understand what you said and if I am correct it will be using 224
executor cores just for fetching + stage-1 processing of 224 partitions. I
will obviously need more cores for processing further stages and fetching
next batch.
I will start with higher number of executor cores and s
> "How do I keep a balance of executors which receive data from Kafka and
which process data"
I think you're misunderstanding how the direct stream works. The executor
which receives data is also the executor which processes data, there aren't
separate receivers. If it's a single stage worth of