Hello Vicky, Thanks for the KIP! I made a quick pass and here are some quick thoughts:
1. Store Implementation: this may be not directly related to the KIP itself since its all internal, but the stream-stream join state store implementation has been changed in https://issues.apache.org/jira/browse/KAFKA-10847, in which we added a separate store to maintain all the records that have not found a match yet, and would emit them when time passed for left/outer joins. In this optimization, I think we can still go with a single store but we need to make sure we do not regress on KAFKA-10847, i.e. for records not finding a match, we should also emit them when time passed by, this would likely rely on the ability to range-over the only store on its "expired" records. A good reference would be in the recent works to allow emitting final for windowed aggregations (cc @Hao Li <lihaos...@gmail.com> who can provide some more references). 2. Join Semantics and Outer-Joins: I think we need to clarify for any single stream record, would itself also be considered a "match" for itself, OR should we consider only a different record but with the same key and within the join window a "match" for itself. If it's the former, then I agree that outer-joins (even left-joins, right?) would not make sense since we would always find at least a match for any record; if it's the latter, then outer/left joins still make sense and we would need to consider the store implementation as stated in 1) above. Personally, I think the latter is better --- I know it's a bit away from the RDBMS self-join semantics but for RDBMS self-joins are usually not on PKs, but on FKs so I think its semantics is less relevant to what we are considering here for windowed stream-stream joins which are still on PKs. 3. Compatibility: first of all, I think we should introduce new values for the TOPOLOGY_OPTIMIZATION_CONFIG for this specific optimization in addition to `all` and `none`, this is also what we discussed before to keep compatibility. But for applications that are already running, we'd also need to make sure that after a rolling bounce with this config value changed, we would not break the app. That involves: a) the store names (and hence the changelog names) should not change -- when we use suffixes, we should make sure they do not change by burning some suffixes as well, b) the processor names, similar to store names, c) store formats, if we ever change the store formats, we need to consider a live upgrade path as well. Please let me know your thoughts. Guozhang On Tue, Aug 2, 2022 at 11:31 AM Vasiliki Papavasileiou <vpapavasile...@confluent.io.invalid> wrote: > Hello everyone, > > I would like to start the discussion for KIP-862: Implement self-join > optimization > > The KIP can be found here: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-862%3A+Self-join > > Any suggestions are more than welcome. > > Many thanks, > Vicky > -- -- Guozhang