[jira] [Commented] (KAFKA-3561) Auto create through topic for KStream aggregation and join

Guozhang Wang (JIRA) Fri, 27 May 2016 10:08:17 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304347#comment-15304347
 ]


Guozhang Wang commented on KAFKA-3561:
--------------------------------------

Hi [~damianguy] a couple thoughts:

1. Topics referenced in {{through}} call are treated as "external topics" that 
owned by users themselves, like topics from {{builder.stream/table}} and {{to}} 
operators. One goal of this ticket about "auto creating repartition topics" is 
to make it as an internal topic just as what we did for {{KTable.aggregation}}. 
So we should not simply just add a "through" operator in between, and requiring 
users to provide the topic name.

2. One related JIRA ticket is https://issues.apache.org/jira/browse/KAFKA-3576: 
we originally want to differentiate KStream and KTable as much as possible for 
their semantic difference at the API layer, but there are suggestions that we 
may actually consider unifying them just for consistent user experience. Would 
like to hear your opinions on that, and if we are really going to do KAFKA-3576 
as well, we'd better do it together with this ticket.

> Auto create through topic for KStream aggregation and join
> ----------------------------------------------------------
>
>                 Key: KAFKA-3561
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3561
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Damian Guy
>              Labels: api
>             Fix For: 0.10.1.0
>
>
> For KStream.join / aggregateByKey operations that requires the streams to be 
> partitioned on the record key, today users should repartition themselves 
> through the "through" call:
> {code}
> stream1 = builder.stream("topic1");
> stream2 = builder.stream("topic2");
> stream3 = stream1.map(/* set the right key for join*/).through("topic3");
> stream4 = stream2.map(/* set the right key for join*/).through("topic4");
> stream3.join(stream4, ..)
> {code}
> This pattern can actually be done by the Streams DSL itself instead of 
> requiring users to specify themselves, i.e. users can just set the right key 
> like (see KAFKA-3430) and then call join, which will be translated by adding 
> the "internal topic for repartition".
> Another thing is that today if user do not call "through" after setting a new 
> key, the aggregation result would not be correct as the aggregation is based 
> on key B while the source partitions is partitioned by key A and hence each 
> task will only get a partial aggregation for all keys. But this is not 
> validated in the DSL today. We should do both the auto-translation and 
> validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3561) Auto create through topic for KStream aggregation and join

Reply via email to