[
https://issues.apache.org/jira/browse/KAFKA-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521985#comment-17521985
]
Matthias J. Sax commented on KAFKA-12909:
-----------------------------------------
This ticket is about left/outer join in particular and the "emit at window
close" strategy is only applied to none-matching records. Ie, even if you have
a left/outer join, all _inner_ join result of the operation are emitted right
away. However, it's not "safe" to emit a left/right join result eagerly, as
this record might actually find a join partner later – thus, we need to delay
emitting "un-joined" record until the grace period passed to ensure the compute
the right result.
In the old implementation, we basically did not compute the correct left/outer
join result, but a super-set of it. – Your DB argument does not really apply,
because the result is a KStream and thus we should only emit _final_ result. If
we emit an <k,<v1,null>> eagerly and a second <k,<v1,v2>> later, the second one
is _not_ and update to the first one (a KStream has no update semantics) –
otherwise we would need to treat all results with the same key as _updates_ but
if a record joins twice, the second join result is also not an update to the
first one.
Does this make sense?
> Allow users to opt-into spurious left/outer stream-stream join improvement
> --------------------------------------------------------------------------
>
> Key: KAFKA-12909
> URL: https://issues.apache.org/jira/browse/KAFKA-12909
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Matthias J. Sax
> Assignee: Matthias J. Sax
> Priority: Blocker
> Fix For: 3.1.0
>
>
> https://issues.apache.org/jira/browse/KAFKA-10847 improves left/outer
> stream-stream join, by not emitting left/outer results eagerly, but only
> after the grace period passed.
> While this change is desired, there is an issue with regard to upgrades: if
> users don't specify a grace period, we fall back to a 24h default. Thus,
> left/outer join results would only be emitted 24h after the join window end.
> This change in behavior could break existing applications when upgrading to
> 3.0.0 release. – And even if users do set a grace period explicitly, it's
> still unclear if the new delayed output behavior would work for them.
> Thus, we propose to disable the fix of KAFAK-10847 by default, and let user
> opt-into the fix explicitly instead.
> To allow users to enable the fix, we want to piggy-back on KIP-633
> (https://issues.apache.org/jira/browse/KAFKA-8613) that deprecated the
> existing `JoinWindows.of()` and `JoinWindows#grace()` methods in favor of
> `JoinWindows.ofSizeAndGrace()` – if users don't update their code, we would
> keep the fix disabled, and thus, if users upgrade their app nothing changes.
> Only if users switch to the new `ofSizeAndGrace()` API, we enable the fix and
> thus give users the opportunity to opt-in expliclity and pick an appropriate
> grace period for their application.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)