No, there isn't a partitioner for KafkaRDD (KafkaRDD may not even be a pair rdd, for instance).
It sounds to me like if it's a self-join, you should be able to do it in a single mapPartition operation. On Wed, Sep 2, 2015 at 3:06 PM, Chen Song <chen.song...@gmail.com> wrote: > I have a stream got from Kafka with direct approach, say, inputStream, I > need to > > 1. Create another DStream derivedStream with map or mapPartitions (with > some data enrichment with reference table) on inputStream > 2. Join derivedStream with inputStream > > In my use case, I don't need to shuffle data. Each partition in > derivedStream only needs to be joined with the corresponding partition in > the original parent inputStream it is generated from. > > My question is > > 1. Is there a Partitioner defined in KafkaRDD at all? > 2. How would I preserve the partitioning scheme and avoid data shuffle? > > -- > Chen Song > >