Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-13 Thread Theodor Wübker
Hey Hector, thanks for your reply. Your assumption is entirely correct, I have a few Million datasets on the topic already to test a streaming use case. I am planning on testing it with a variety of settings, but the problems occur with any cluster-configuration. For example Parallelism 1 with

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-13 Thread Hector Rios
Hi Theo In your initial email, you mentioned that you have "a bit of Data on it" when referring to your topic with ten partitions. Correct me if I'm wrong, but that sounds like the data in your topic is bounded and trying to test a streaming use-case. What kind of parallelism do you have configure

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread Theodor Wübker
Hey, so one more thing, the query looks like this: SELECT window_start, window_end, a, b, c, count(*) as x FROM TABLE(TUMBLE(TABLE data.v1, DESCRIPTOR(timeStampData), INTERVAL '1' HOUR)) GROUP BY window_start, window_end, a, b, c When the non-determinism occurs, the topic is not keyed at all.

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread Theodor Wübker
Hey Yuxia, thanks for your response. I figured too, that the events arrive in a (somewhat) random order and thus cause non-determinism. I used a Watermark like this:"timeStampData - INTERVAL '10' SECOND” . Increasing the Watermark Interval does not solve the problem though, the results are stil

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread yuxia
HI, Theo. I'm wondering what the Event-Time-Windowed Query you are using looks like. For example, how do you define the watermark? Considering you read records from the 10 partitions, and it may well that the records will arrive the window process operator out of order. Is it possible that the re