Greg Fodor created KAFKA-3542:
---------------------------------

             Summary: Add "repartition (+ join)" operations to streams
                 Key: KAFKA-3542
                 URL: https://issues.apache.org/jira/browse/KAFKA-3542
             Project: Kafka
          Issue Type: Improvement
          Components: streams
    Affects Versions: 0.10.0.0
            Reporter: Greg Fodor
            Assignee: Guozhang Wang
            Priority: Minor


A common operation in Kafka Streams seems to be to repartition the stream onto 
a different column, usually for joining. The current way I've been doing this:

- Perform a map on the stream to the same value with a new key (the key we're 
going to join on, usually a foreign key)
- Sink the stream into a new topic
- Create a new stream sourcing that topic
- Perform the join

Note that without explicitly sinking the intermediate topic, the topology will 
fail to build because of the assertion that both sides of a join are connected 
to source nodes. When you perform a map, the link between the source nodes and 
the tail node of the topology is broken (by setting the source nodes to null) 
so you are forced to sink to use that output in a join.

It seems that this pattern could possibly be rolled into much simpler 
operation(s). For example, the map could be changed into a "repartition" method 
where you just return the new key. And the join itself could be simplified by 
letting you specify a re-partition function on either side of the join and 
create the intermediate topic implicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to