[ https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias J. Sax updated KAFKA-12317: ------------------------------------ Description: Currently, for a stream-streams and stream-table/globalTable join KafkaStreams drops all stream records with a `null`-key (`null`-join-key for stream-globalTable), because for a `null`-(join)key the join is undefined: ie, we don't have an attribute the do the table lookup (we consider the stream-record as malformed). Note, that we define the semantics of _left/outer_ join as: keep the stream record if no matching join record was found. We could relax the definition of _left_ stream-table/globalTable and _left/outer_ stream-stream join though, and not drop `null`-(join)key stream records, and call the ValueJoiner with a `null` "other-side" value instead: if the stream record key (or join-key) is `null`, we could treat is as "failed lookup" instead of treating the stream record as corrupted. If we make this change, users that want to keep the current behavior, can add a `filter()` before the join to drop `null`-(join)key records from the stream explicitly. Note that this change also requires to change the behavior if we insert a repartition topic before the join: currently, we drop `null`-key record before writing into the repartition topic (as we know they would be dropped later anyway). We need to relax this behavior for a left stream-table and left/outer stream-stream join. User need to be aware (ie, we might need to put this into the docs and JavaDocs), that records with `null`-key would be partitioned randomly. was: Currently, for a stream-streams and stream-table/globalTable join KafkaStreams drops all stream records with a null-key, because for a null-key the join is undefined: ie, we don't have an attribute the do the table lookup (we consider the stream-record as malformed). Note, that we define the semantics of _left_ join as: keep the stream record if no KTable record was found. We could relax the definition of _left_ join though, and not drop non-key stream records, and call the ValueJoiner with a `null` table record instead: if the stream record key is `null`, we could treat is as "failed table lookup" instead of treating the stream record as corrupted. If we make this change, users that want to keep the current behavior, can add a `filter()` before the join to drop `null`-key records from the stream explicitly. Note that this change also requires to change the behavior if we insert a repartition topic before the join: currently, we drop `null`-key record before writing into the repartition topic (as we know they would be dropped later anyway). We need to relax this behavior for a left/outer stream-table (and maybe left/outer > Relax non-null key requirement for left/outer KStream joins > ----------------------------------------------------------- > > Key: KAFKA-12317 > URL: https://issues.apache.org/jira/browse/KAFKA-12317 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Matthias J. Sax > Priority: Major > > Currently, for a stream-streams and stream-table/globalTable join > KafkaStreams drops all stream records with a `null`-key (`null`-join-key for > stream-globalTable), because for a `null`-(join)key the join is undefined: > ie, we don't have an attribute the do the table lookup (we consider the > stream-record as malformed). Note, that we define the semantics of > _left/outer_ join as: keep the stream record if no matching join record was > found. > We could relax the definition of _left_ stream-table/globalTable and > _left/outer_ stream-stream join though, and not drop `null`-(join)key stream > records, and call the ValueJoiner with a `null` "other-side" value instead: > if the stream record key (or join-key) is `null`, we could treat is as > "failed lookup" instead of treating the stream record as corrupted. > If we make this change, users that want to keep the current behavior, can add > a `filter()` before the join to drop `null`-(join)key records from the stream > explicitly. > Note that this change also requires to change the behavior if we insert a > repartition topic before the join: currently, we drop `null`-key record > before writing into the repartition topic (as we know they would be dropped > later anyway). We need to relax this behavior for a left stream-table and > left/outer stream-stream join. User need to be aware (ie, we might need to > put this into the docs and JavaDocs), that records with `null`-key would be > partitioned randomly. -- This message was sent by Atlassian Jira (v8.3.4#803005)