Hello kafka community,
Hi, in KTable-KTable Join document from an older version, the cwiki
mentions:
“Pay attention, that the KTable lookup is done on the current KTable state,
and thus, out-of-order records can yield non-deterministic result.
Furthermore, in practice Kafka Streams does not guarantee that all records
will be processed in timestamp order (even if processing records in
timestamp order is the goal, it is only best effort).“. Is still valid?
What does it mean? Is this only about a temporary glitch in emitted data,
or in the eventual result of the data (assuming the output is written in
compacted topics)? can we expect things to become eventually consistent and
present in that eventual state of a compacted topic output?
What kinds of inconsistencies or data loss in terms of the join output can
we expect if any? Are all the joined records going to be outputed
eventually? Or is there a possibility of race condition where with the
default ktable join it is possible that concurrent processing of messages
would cause the pair to never be emitted? for instance two messages arrive
in the t1 and t2 topics; they are concurrently processed and joined with
the local state of the other which does not contain its pair; at this stage
it seems data loss is possible & permanent; but does kafka streams take
into account when replicating the change log for instance and detect that a
pair has been added on another node, and emit that pair?
Is this join commutative (sort of semi-group, like CRDTs), so that any
concurrency in processing results in a consistent eventual state?
Thank you


On Tue, 14 Jul 2020 at 13:08, Dumitru-Nicolae Marasoui <
nicolae.maras...@kaluza.com> wrote:

> In
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics#KafkaStreamsJoinSemantics-KTable-KTableJoin.1
>  it
> is mentioned that "Pay attention, that the KTable lookup is done on the
> *current* KTable state, and thus, out-of-order records can yield
> non-deterministic result. Furthermore, in practice Kafka Streams does not
> guarantee that all records will be processed in timestamp order (even if
> processing records in timestamp order is the goal, it is only best effort)."
>
> Is this still a valid concern?
> Can you give a few examples on how this may happen and how the end result
> would look like? I guess non determinism in this case means that the end
> result (the eventual result) can be one of many possible combinations?
>
> Which version(s) of kafka streams have this concern? All of them, right?
> (is there any difference between open source & confluent versions?)
>
> Thank you
>
> --
>
> Dumitru-Nicolae Marasoui
>
> Software Engineer
>
>
>
> w kaluza.com <https://www.kaluza.com/>
>
> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
> <https://twitter.com/Kaluza_tech>
>
> Kaluza Ltd. registered in England and Wales No. 08785057
>
> VAT No. 100119879
>
> Help save paper - do you need to print this email?
>


-- 

Dumitru-Nicolae Marasoui

Software Engineer



w kaluza.com <https://www.kaluza.com/>

LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
<https://twitter.com/Kaluza_tech>

Kaluza Ltd. registered in England and Wales No. 08785057

VAT No. 100119879

Help save paper - do you need to print this email?

Reply via email to