[
https://issues.apache.org/jira/browse/KAFKA-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236374#comment-15236374
]
Greg Fodor commented on KAFKA-3542:
-----------------------------------
Right, this map approach is what I am doing right now before all of my joins,
though I didn't realize I could use through() to generate a joinable stream
without sourcing it explicity from the new topic. I will see if some of my
joins can be satisfied with the aggregator first approach. The thing that
bothers me about the current map -> sink approach is that the map is not really
DRY (I should just need to specify the selector to re-partition on) and the
intermediate topic name should just be generated. I agree an implicit through()
call could be useful in place of the assertion currently being made to
determine if two streams are joinable.
> Add "repartition (+ join)" operations to streams
> ------------------------------------------------
>
> Key: KAFKA-3542
> URL: https://issues.apache.org/jira/browse/KAFKA-3542
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 0.10.0.0
> Reporter: Greg Fodor
> Assignee: Guozhang Wang
> Priority: Minor
>
> A common operation in Kafka Streams seems to be to repartition the stream
> onto a different column, usually for joining. The current way I've been doing
> this:
> - Perform a map on the stream to the same value with a new key (the key we're
> going to join on, usually a foreign key)
> - Sink the stream into a new topic
> - Create a new stream sourcing that topic
> - Perform the join
> Note that without explicitly sinking the intermediate topic, the topology
> will fail to build because of the assertion that both sides of a join are
> connected to source nodes. When you perform a map, the link between the
> source nodes and the tail node of the topology is broken (by setting the
> source nodes to null) so you are forced to sink to use that output in a join.
> It seems that this pattern could possibly be rolled into much simpler
> operation(s). For example, the map could be changed into a "repartition"
> method where you just return the new key. And the join itself could be
> simplified by letting you specify a re-partition function on either side of
> the join and create the intermediate topic implicitly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)