[
https://issues.apache.org/jira/browse/KAFKA-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236343#comment-15236343
]
Greg Fodor commented on KAFKA-3542:
-----------------------------------
Ah, I may understand what you're getting at here -- to do the operation I have
in mind, you would first perform an aggregation to pivot the streams onto the
proper keys (via the selector), and then join those streams. Is that correct?
> Add "repartition (+ join)" operations to streams
> ------------------------------------------------
>
> Key: KAFKA-3542
> URL: https://issues.apache.org/jira/browse/KAFKA-3542
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 0.10.0.0
> Reporter: Greg Fodor
> Assignee: Guozhang Wang
> Priority: Minor
>
> A common operation in Kafka Streams seems to be to repartition the stream
> onto a different column, usually for joining. The current way I've been doing
> this:
> - Perform a map on the stream to the same value with a new key (the key we're
> going to join on, usually a foreign key)
> - Sink the stream into a new topic
> - Create a new stream sourcing that topic
> - Perform the join
> Note that without explicitly sinking the intermediate topic, the topology
> will fail to build because of the assertion that both sides of a join are
> connected to source nodes. When you perform a map, the link between the
> source nodes and the tail node of the topology is broken (by setting the
> source nodes to null) so you are forced to sink to use that output in a join.
> It seems that this pattern could possibly be rolled into much simpler
> operation(s). For example, the map could be changed into a "repartition"
> method where you just return the new key. And the join itself could be
> simplified by letting you specify a re-partition function on either side of
> the join and create the intermediate topic implicitly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)