Re: DynamicPartitionKafkaRDD - 1:n mapping between kafka and RDD partition

2016-03-15 Thread Cody Koeninger
No, I don't agree that someone explicitly calling repartition or shuffle is the same as a constructor that implicitly breaks guarantees. Realistically speaking, the changes you have made are also totally incompatible with the way kafka's new consumer works. Pulling different out-of-order chunks of

Re: DynamicPartitionKafkaRDD - 1:n mapping between kafka and RDD partition

2016-03-14 Thread Renyi Xiong
right. However, I think it's developer's choice to purposely drop the guarantee like when they use the existing DStream.repartition where original per-topicpartition in-order processing is also not observed any more. Do you agree? On Thu, Mar 10, 2016 at 12:12 PM, Cody Koeninger wrote: > The c

Re: DynamicPartitionKafkaRDD - 1:n mapping between kafka and RDD partition

2016-03-10 Thread Cody Koeninger
The central problem with doing anything like this is that you break one of the basic guarantees of kafka, which is in-order processing on a per-topicpartition basis. As far as PRs go, because of the new consumer interface for kafka 0.9 and 0.10, there's a lot of potential change already underway.

DynamicPartitionKafkaRDD - 1:n mapping between kafka and RDD partition

2016-03-10 Thread Renyi Xiong
Hi TD, Thanks a lot for offering to look at our PR (if we fire one) at the conference NYC. As we discussed briefly the issues of unbalanced and under-distributed kafka partitions when developing Spark streaming application in Mobius (C# for Spark), we're trying the option of repartitioning within