[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew updated KAFKA-8315:
--------------------------
    Description: 
The problem we are experiencing is that we cannot reliably perform simple joins 
over pre-populated kafka topics. This seems more of a problem where one topic 
has records at less frequent record timestamp intervals that the other.
 An example of the issue is provided in this repository :

[https://github.com/the4thamigo-uk/join-example]

 

1) This issue was initially thought to be due to the inability to set the 
retention period for a join window via {{Materialized: i.e.}}

The documentation says to use `Materialized` not `JoinWindows.until()` 
([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
 but there is no where to pass a `Materialized` instance to the join operation, 
only to the group operation is supported it seems.

This was considered to be a problem with the documentation not with the API and 
is addressed in [https://github.com/apache/kafka/pull/6664]

2) We then found an apparent issue in the code which would affect the partition 
that is selected to deliver the next record to the join. This would only be a 
problem for data that is out-of-order, and join-example uses data that is in 
order of timestamp in both topics. So this fix is thought not to affect 
join-example.

This was considered to be an issue and is being addressed in 
https://github.com/apache/kafka/pull/6719

 3) Further investigation using a couple of crafted unit tests 

 

Slack conversation here : 
[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]

[Additional]

>From what I understand, the retention period should be independent of the 
>grace period, so I think this is more than a documentation fix (see comments 
>below)

  was:
The problem we are experiencing is that we cannot reliably perform simple joins 
over pre-populated kafka topics. This seems more of a problem where one topic 
has records at less frequent record timestamp intervals that the other.
 An example of the issue is provided in this repository :

https://github.com/the4thamigo-uk/join-example

 

1) This issue was initially thought to be due to the inability to set the 
retention period for a join window via {{Materialized: i.e.}}

The documentation says to use `Materialized` not `JoinWindows.until()` 
([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
 but there is no where to pass a `Materialized` instance to the join operation, 
only to the group operation is supported it seems.

This was considered to be a problem with the documentation not with the API and 
is addressed in [https://github.com/apache/kafka/pull/6664]

2) We then found an apparent issue in the code which would affect the partition 
that is selected to deliver the next record to the join. 

This was considered to be an issue and is being addressed in 

 

 

Slack conversation here : 
[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]

[Additional]

>From what I understand, the retention period should be independent of the 
>grace period, so I think this is more than a documentation fix (see comments 
>below)


> Historical join issues
> ----------------------
>
>                 Key: KAFKA-8315
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8315
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Andrew
>            Assignee: John Roesler
>            Priority: Major
>         Attachments: code.java
>
>
> The problem we are experiencing is that we cannot reliably perform simple 
> joins over pre-populated kafka topics. This seems more of a problem where one 
> topic has records at less frequent record timestamp intervals that the other.
>  An example of the issue is provided in this repository :
> [https://github.com/the4thamigo-uk/join-example]
>  
> 1) This issue was initially thought to be due to the inability to set the 
> retention period for a join window via {{Materialized: i.e.}}
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
> This was considered to be a problem with the documentation not with the API 
> and is addressed in [https://github.com/apache/kafka/pull/6664]
> 2) We then found an apparent issue in the code which would affect the 
> partition that is selected to deliver the next record to the join. This would 
> only be a problem for data that is out-of-order, and join-example uses data 
> that is in order of timestamp in both topics. So this fix is thought not to 
> affect join-example.
> This was considered to be an issue and is being addressed in 
> https://github.com/apache/kafka/pull/6719
>  3) Further investigation using a couple of crafted unit tests 
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to