[ https://issues.apache.org/jira/browse/KAFKA-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236417#comment-15236417 ]
Greg Fodor commented on KAFKA-3542: ----------------------------------- great. feel free to close. > Add "repartition (+ join)" operations to streams > ------------------------------------------------ > > Key: KAFKA-3542 > URL: https://issues.apache.org/jira/browse/KAFKA-3542 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 0.10.0.0 > Reporter: Greg Fodor > Assignee: Guozhang Wang > Priority: Minor > > A common operation in Kafka Streams seems to be to repartition the stream > onto a different column, usually for joining. The current way I've been doing > this: > - Perform a map on the stream to the same value with a new key (the key we're > going to join on, usually a foreign key) > - Sink the stream into a new topic > - Create a new stream sourcing that topic > - Perform the join > Note that without explicitly sinking the intermediate topic, the topology > will fail to build because of the assertion that both sides of a join are > connected to source nodes. When you perform a map, the link between the > source nodes and the tail node of the topology is broken (by setting the > source nodes to null) so you are forced to sink to use that output in a join. > It seems that this pattern could possibly be rolled into much simpler > operation(s). For example, the map could be changed into a "repartition" > method where you just return the new key. And the join itself could be > simplified by letting you specify a re-partition function on either side of > the join and create the intermediate topic implicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)