[
https://issues.apache.org/jira/browse/KAFKA-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias J. Sax updated KAFKA-12317:
------------------------------------
Description:
Currently, for a stream-streams and stream-table/globalTable join KafkaStreams
drops all stream records with a `null`-key (`null`-join-key for
stream-globalTable), because for a `null`-(join)key the join is undefined: ie,
we don't have an attribute the do the table lookup (we consider the
stream-record as malformed). Note, that we define the semantics of _left/outer_
join as: keep the stream record if no matching join record was found.
We could relax the definition of _left_ stream-table/globalTable and
_left/outer_ stream-stream join though, and not drop `null`-(join)key stream
records, and call the ValueJoiner with a `null` "other-side" value instead: if
the stream record key (or join-key) is `null`, we could treat is as "failed
lookup" instead of treating the stream record as corrupted.
If we make this change, users that want to keep the current behavior, can add a
`filter()` before the join to drop `null`-(join)key records from the stream
explicitly.
Note that this change also requires to change the behavior if we insert a
repartition topic before the join: currently, we drop `null`-key record before
writing into the repartition topic (as we know they would be dropped later
anyway). We need to relax this behavior for a left stream-table and left/outer
stream-stream join. User need to be aware (ie, we might need to put this into
the docs and JavaDocs), that records with `null`-key would be partitioned
randomly.
was:
Currently, for a stream-streams and stream-table/globalTable join KafkaStreams
drops all stream records with a null-key, because for a null-key the join is
undefined: ie, we don't have an attribute the do the table lookup (we consider
the stream-record as malformed). Note, that we define the semantics of _left_
join as: keep the stream record if no KTable record was found.
We could relax the definition of _left_ join though, and not drop non-key
stream records, and call the ValueJoiner with a `null` table record instead: if
the stream record key is `null`, we could treat is as "failed table lookup"
instead of treating the stream record as corrupted.
If we make this change, users that want to keep the current behavior, can add a
`filter()` before the join to drop `null`-key records from the stream
explicitly.
Note that this change also requires to change the behavior if we insert a
repartition topic before the join: currently, we drop `null`-key record before
writing into the repartition topic (as we know they would be dropped later
anyway). We need to relax this behavior for a left/outer stream-table (and
maybe left/outer
> Relax non-null key requirement for left/outer KStream joins
> -----------------------------------------------------------
>
> Key: KAFKA-12317
> URL: https://issues.apache.org/jira/browse/KAFKA-12317
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Matthias J. Sax
> Priority: Major
>
> Currently, for a stream-streams and stream-table/globalTable join
> KafkaStreams drops all stream records with a `null`-key (`null`-join-key for
> stream-globalTable), because for a `null`-(join)key the join is undefined:
> ie, we don't have an attribute the do the table lookup (we consider the
> stream-record as malformed). Note, that we define the semantics of
> _left/outer_ join as: keep the stream record if no matching join record was
> found.
> We could relax the definition of _left_ stream-table/globalTable and
> _left/outer_ stream-stream join though, and not drop `null`-(join)key stream
> records, and call the ValueJoiner with a `null` "other-side" value instead:
> if the stream record key (or join-key) is `null`, we could treat is as
> "failed lookup" instead of treating the stream record as corrupted.
> If we make this change, users that want to keep the current behavior, can add
> a `filter()` before the join to drop `null`-(join)key records from the stream
> explicitly.
> Note that this change also requires to change the behavior if we insert a
> repartition topic before the join: currently, we drop `null`-key record
> before writing into the repartition topic (as we know they would be dropped
> later anyway). We need to relax this behavior for a left stream-table and
> left/outer stream-stream join. User need to be aware (ie, we might need to
> put this into the docs and JavaDocs), that records with `null`-key would be
> partitioned randomly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)