Hey, so one more thing, the query looks like this:
SELECT window_start, window_end, a, b, c, count(*) as x FROM TABLE(TUMBLE(TABLE data.v1, DESCRIPTOR(timeStampData), INTERVAL '1' HOUR)) GROUP BY window_start, window_end, a, b, c When the non-determinism occurs, the topic is not keyed at all. When I key it by the attribute “a”, I get the incorrect, but deterministic results. Maybe in the second case, only 1 partition out of the 10 is consumed at once? Best, Theo > On 13. Feb 2023, at 08:15, Theodor Wübker <theo.wueb...@inside-m2m.de> wrote > > Hey Yuxia, > > thanks for your response. I figured too, that the events arrive in a > (somewhat) random order and thus cause non-determinism. I used a Watermark > like this:"timeStampData - INTERVAL '10' SECOND” . Increasing the Watermark > Interval does not solve the problem though, the results are still not > deterministic. Instead I keyed the 10 partition topic. Now results are > deterministic, but they are incorrect (way too few). Am I doing something > fundamentally wrong? I just need the messages to be in somewhat in order > (just so they don’t violate the watermark). > > Best, > Theo > > (sent again, sorry, I previously only responded to you, not the Mailing list > by accident) > >> On 13. Feb 2023, at 08:14, Theodor Wübker <theo.wueb...@inside-m2m.de >> <mailto:theo.wueb...@inside-m2m.de>> wrote: >> >> Hey Yuxia, >> >> thanks for your response. I figured too, that the events arrive in a >> (somewhat) random order and thus cause non-determinism. I used a Watermark >> like this: "timeStampData - INTERVAL '10' SECOND” . Increasing the Watermark >> Interval does not solve the problem though, the results are still not >> deterministic. Instead I keyed the 10 partition topic. Now results are >> deterministic, but they are incorrect (way too few). Am I doing something >> fundamentally wrong? I just need the messages to be in somewhat in order >> (just so they don’t violate the watermark). >> >> Best, >> Theo >> >>> On 13. Feb 2023, at 04:23, yuxia <luoyu...@alumni.sjtu.edu.cn >>> <mailto:luoyu...@alumni.sjtu.edu.cn>> wrote: >>> >>> HI, Theo. >>> I'm wondering what the Event-Time-Windowed Query you are using looks like. >>> For example, how do you define the watermark? >>> Considering you read records from the 10 partitions, and it may well that >>> the records will arrive the window process operator out of order. >>> Is it possible that the records exceed the watermark, but there're still >>> some records will arrive? >>> >>> If that's the case, every time, the records used to calculate result may >>> well different and then result in non-determinism result. >>> >>> Best regards, >>> Yuxia >>> >>> ----- 原始邮件 ----- >>> 发件人: "Theodor Wübker" <theo.wueb...@inside-m2m.de >>> <mailto:theo.wueb...@inside-m2m.de>> >>> 收件人: "User" <user@flink.apache.org <mailto:user@flink.apache.org>> >>> 发送时间: 星期日, 2023年 2 月 12日 下午 4:25:45 >>> 主题: Non-Determinism in Table-API with Kafka and Event Time >>> >>> Hey everyone, >>> >>> I experience non-determinism in my Table API Program at the moment and (as >>> a relatively unexperienced Flink and Kafka user) I can’t really explain to >>> myself why it happens. So, I have a topic with 10 Partitions and a bit of >>> Data on it. Now I run a simple SELECT * query on this, that moves some >>> attributes around and writes everything on another topic with 10 >>> partitions. Then, on this topic I run a Event-Time-Windowed Query. Now I >>> experience Non-Determinism: The results of the windowed query differ with >>> every execution. >>> I thought this might be, because the SELECT query wrote the data to the >>> partitioned topic without keys. So I tried it again with the same key I >>> used for the original topic. It resulted in the exact same topic structure. >>> Now when I run the Event-Time-Windowed query, I get incorrect results (too >>> few result-entries). >>> >>> I have already read a lot of the Docs on this and can’t seem to figure it >>> out. I would much appreciate, if someone could shed a bit of light on this. >>> Is there anything in particular I should be aware of, when reading >>> partitioned topics and running an event time query on that? Thanks :) >>> >>> >>> Best, >>> Theo >> >
smime.p7s
Description: S/MIME cryptographic signature