[jira] [Commented] (KAFKA-3542) Add "repartition (+ join)" operations to streams

Greg Fodor (JIRA) Mon, 11 Apr 2016 17:44:13 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236343#comment-15236343
 ]


Greg Fodor commented on KAFKA-3542:
-----------------------------------

Ah, I may understand what you're getting at here -- to do the operation I have 
in mind, you would first perform an aggregation to pivot the streams onto the 
proper keys (via the selector), and then join those streams. Is that correct?

> Add "repartition (+ join)" operations to streams
> ------------------------------------------------
>
>                 Key: KAFKA-3542
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3542
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 0.10.0.0
>            Reporter: Greg Fodor
>            Assignee: Guozhang Wang
>            Priority: Minor
>
> A common operation in Kafka Streams seems to be to repartition the stream 
> onto a different column, usually for joining. The current way I've been doing 
> this:
> - Perform a map on the stream to the same value with a new key (the key we're 
> going to join on, usually a foreign key)
> - Sink the stream into a new topic
> - Create a new stream sourcing that topic
> - Perform the join
> Note that without explicitly sinking the intermediate topic, the topology 
> will fail to build because of the assertion that both sides of a join are 
> connected to source nodes. When you perform a map, the link between the 
> source nodes and the tail node of the topology is broken (by setting the 
> source nodes to null) so you are forced to sink to use that output in a join.
> It seems that this pattern could possibly be rolled into much simpler 
> operation(s). For example, the map could be changed into a "repartition" 
> method where you just return the new key. And the join itself could be 
> simplified by letting you specify a re-partition function on either side of 
> the join and create the intermediate topic implicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-3542) Add "repartition (+ join)" operations to streams

Reply via email to