right. However, I think it's developer's choice to purposely drop the guarantee like when they use the existing DStream.repartition where original per-topicpartition in-order processing is also not observed any more.
Do you agree? On Thu, Mar 10, 2016 at 12:12 PM, Cody Koeninger <c...@koeninger.org> wrote: > The central problem with doing anything like this is that you break > one of the basic guarantees of kafka, which is in-order processing on > a per-topicpartition basis. > > As far as PRs go, because of the new consumer interface for kafka 0.9 > and 0.10, there's a lot of potential change already underway. > > See > > https://issues.apache.org/jira/browse/SPARK-12177 > > On Thu, Mar 10, 2016 at 1:59 PM, Renyi Xiong <renyixio...@gmail.com> > wrote: > > Hi TD, > > > > Thanks a lot for offering to look at our PR (if we fire one) at the > > conference NYC. > > > > As we discussed briefly the issues of unbalanced and under-distributed > kafka > > partitions when developing Spark streaming application in Mobius (C# for > > Spark), we're trying the option of repartitioning within > > DirectKafkaInputDStream instead of DStream.repartiton API which > introduces > > extra network cost and doesn't really solve the root cause. > > > > However, instead of firing a JIRA with PR directly, we decided to create > a > > customized Kafka RDD / DStream (to start with and contribute back later > if > > success) - DynamicPartitionKafkaRDD and DynamicPartitionKafkaInputDStream > > using inheritance model and expose a new API > > KafkaUtils.CreateDirectStreamWithRepartition with one more parameter - > > numPartitions (hint number of RDD partitions to create) > > > > it'll be great that you can take look at the code and share your > comments: > > > > > https://github.com/Microsoft/Mobius/tree/master/scala/src/main/org/apache/spark/streaming/api/kafka > > > > the major relevant change is in DynamicPartitionKafkaRDD.getPartitions > where > > an average size of RDD partition is calculated first (total size of the > > topic divided by numPartitions) and used to split partitions (final RDD > > partitions will be greater or equal to numPartitions) > > > > there's a concern that Kafka partition[i] no longer maps to task[i] which > > might break existing application. here's our thinking: > > > > a. OffsetRanges in original implementation may have multiple topics > meaning > > 'partition i maps to tasks i' is generally a false statement > > > > b. Even if only one topic is involved, partition sequence in offsetRanges > > comes from Kafka topic meta data response which doesn't necessary > guarantee > > the sequence, even if it does, application should not take that > dependency > > > > c. Topic partition split happens only when configured > > > > > > there're some other more complicated changes related to fault tolerance > > which are irrelevant here (but you're more than welcome to comment on > them > > too) and are introduced to unblock the scenarios we're experiencing on a > > daily basis. > > > > 1. temporally redirect kafka read to C# worker by passing metadata > instead > > of actual kafka messages to it, in C# worker, a C# version of kafka > client > > is used which enables much easier debugging > > > > 2. bypass metadata request exceptions on driver side and let next batch > > retry > > > > 3. bypass some read errors on worker side > > > > > > Note all above are at very early stage, your comments will be much > valuable > > and appreciated. > > > > > > Thanks a lot, > > > > Reny. >