[ 
https://issues.apache.org/jira/browse/KAFKA-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pascal Büttiker updated KAFKA-8917:
-----------------------------------
    Description: 
As described in KAFKA-7718, headers are promoted by the "triggering" record in 
stateful operations such as Joins. This was very confusing and we spent quite 
some time debugging this.

While we ideally have full control over this behaviour, like the KAFKA-7718 
proposes, I hope we can solve some of the randomness before this:
 * Inner-Join: Keep as is (use the headers of the triggering record)
 * Full-Join: Keep as is (use the headers of the triggering record)
 * Left-Join: *Always pick the headers of the left record.*
 * Right-Join: *Always pick the headers of the right record.*

This behaviour would solve the most pressing issues when dealing with headers 
in Kafka Streams.

*Motivation*:

In a CDC scenario, we usually have to resolve the relational database joins on 
our side, which usually means we enrich one record from a couple of other 
topics. So for a typical CDC use-case, Left-Joins allow the most basic 
de-normalisations from relational data models. Therefore, when we can solve the 
header behaviour for left/right joins, we can actually use Kafka Streams in a 
CDC scenario with joins and headers.

We depend on headers, especially when dealing with tombstone records. There is 
no other way to store additional information. If we do not use tombstone 
records, all default Kafka Features around compacted topics and KTabels are no 
longer useable. We are able to use custom Transformers to generate the headers 
(basically patch in the missing header support in Kafka Streams), but as soon 
that we use Join/Aggregate we loose control over the headers.

 

 

 

 

 

  was:
As described in KAFKA-7718, headers are promoted by the "triggering" record in 
stateful operations such as Joins. This was very confusing and we spent quite 
some time debugging this.

While we ideally have full control over this behaviour as like the KAFKA-7718 
proposes, I hope we can solve some of the randomness before this:
 * Inner-Join: Keep as is (use the headers of the triggering record)
 * Full-Join: Keep as is (use the headers of the triggering record)
 * Left-Join: *Always pick the headers of the left record.*
 * Right-Join: *Always pick the headers of the right record.*

This behaviour would solve the most pressing issues when dealing with headers 
in Kafka Streams.

*Motivation*:

In a CDC scenario, we usually have to resolve the relational database joins on 
our side, which usually means we enrich one record from a couple of other 
topics. So for a typical CDC use-case, Left-Joins allow the most basic 
de-normalisations from relational data models. Therefore, when we can solve the 
header behaviour for left/right joins, we can actually use Kafka Streams in a 
CDC scenario with joins and headers.

We depend on headers, especially when dealing with tombstone records. There is 
no other way to store additional information. If we do not use tombstone 
records, all default Kafka Features around compacted topics and KTabels are no 
longer useable. We are able to use custom Transformers to generate the headers 
(basically patch in the missing header support in Kafka Streams), but as soon 
that we use Join/Aggregate we loose control over the headers.

 

 

 

 

 


> When performing a Left/Right-Join, pick the headers of the same side
> --------------------------------------------------------------------
>
>                 Key: KAFKA-8917
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8917
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Pascal Büttiker
>            Priority: Major
>
> As described in KAFKA-7718, headers are promoted by the "triggering" record 
> in stateful operations such as Joins. This was very confusing and we spent 
> quite some time debugging this.
> While we ideally have full control over this behaviour, like the KAFKA-7718 
> proposes, I hope we can solve some of the randomness before this:
>  * Inner-Join: Keep as is (use the headers of the triggering record)
>  * Full-Join: Keep as is (use the headers of the triggering record)
>  * Left-Join: *Always pick the headers of the left record.*
>  * Right-Join: *Always pick the headers of the right record.*
> This behaviour would solve the most pressing issues when dealing with headers 
> in Kafka Streams.
> *Motivation*:
> In a CDC scenario, we usually have to resolve the relational database joins 
> on our side, which usually means we enrich one record from a couple of other 
> topics. So for a typical CDC use-case, Left-Joins allow the most basic 
> de-normalisations from relational data models. Therefore, when we can solve 
> the header behaviour for left/right joins, we can actually use Kafka Streams 
> in a CDC scenario with joins and headers.
> We depend on headers, especially when dealing with tombstone records. There 
> is no other way to store additional information. If we do not use tombstone 
> records, all default Kafka Features around compacted topics and KTabels are 
> no longer useable. We are able to use custom Transformers to generate the 
> headers (basically patch in the missing header support in Kafka Streams), but 
> as soon that we use Join/Aggregate we loose control over the headers.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to