[ https://issues.apache.org/jira/browse/KAFKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304347#comment-15304347 ]
Guozhang Wang commented on KAFKA-3561: -------------------------------------- Hi [~damianguy] a couple thoughts: 1. Topics referenced in {{through}} call are treated as "external topics" that owned by users themselves, like topics from {{builder.stream/table}} and {{to}} operators. One goal of this ticket about "auto creating repartition topics" is to make it as an internal topic just as what we did for {{KTable.aggregation}}. So we should not simply just add a "through" operator in between, and requiring users to provide the topic name. 2. One related JIRA ticket is https://issues.apache.org/jira/browse/KAFKA-3576: we originally want to differentiate KStream and KTable as much as possible for their semantic difference at the API layer, but there are suggestions that we may actually consider unifying them just for consistent user experience. Would like to hear your opinions on that, and if we are really going to do KAFKA-3576 as well, we'd better do it together with this ticket. > Auto create through topic for KStream aggregation and join > ---------------------------------------------------------- > > Key: KAFKA-3561 > URL: https://issues.apache.org/jira/browse/KAFKA-3561 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Guozhang Wang > Assignee: Damian Guy > Labels: api > Fix For: 0.10.1.0 > > > For KStream.join / aggregateByKey operations that requires the streams to be > partitioned on the record key, today users should repartition themselves > through the "through" call: > {code} > stream1 = builder.stream("topic1"); > stream2 = builder.stream("topic2"); > stream3 = stream1.map(/* set the right key for join*/).through("topic3"); > stream4 = stream2.map(/* set the right key for join*/).through("topic4"); > stream3.join(stream4, ..) > {code} > This pattern can actually be done by the Streams DSL itself instead of > requiring users to specify themselves, i.e. users can just set the right key > like (see KAFKA-3430) and then call join, which will be translated by adding > the "internal topic for repartition". > Another thing is that today if user do not call "through" after setting a new > key, the aggregation result would not be correct as the aggregation is based > on key B while the source partitions is partitioned by key A and hence each > task will only get a partial aggregation for all keys. But this is not > validated in the DSL today. We should do both the auto-translation and > validation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)