Thanks Cody for very useful information.
It's much more clear to me now. I had a lots of wrong assumptions.
On Nov 23, 2015 10:19 PM, "Cody Koeninger" wrote:
> Partitioner is an optional field when defining an rdd. KafkaRDD doesn't
> define one, so you can't really assume anything about the way
Partitioner is an optional field when defining an rdd. KafkaRDD doesn't
define one, so you can't really assume anything about the way it's
partitioned, because spark doesn't know anything about the way it's
partitioned. If you want to rely on some property of how things were
partitioned as they w
Thanks Cody,
I still have concerns about this.
What's do you mean by saying Spark direct stream doesn't have a default
partitioner? Could you please help me to explain more?
When i assign 20 cores to 20 Kafka partitions, I am expecting each core
will work on a partition. Is it correct?
I'm still
Spark direct stream doesn't have a default partitioner.
If you know that you want to do an operation on keys that are already
partitioned by kafka, just use mapPartitions or foreachPartition to avoid a
shuffle.
On Sat, Nov 21, 2015 at 11:46 AM, trung kien wrote:
> Hi all,
>
> I am having proble