Hey,

so one more thing, the query looks like this:

SELECT window_start, window_end, a, b, c, count(*) as x FROM TABLE(TUMBLE(TABLE 
data.v1, DESCRIPTOR(timeStampData), INTERVAL '1' HOUR)) GROUP BY window_start, 
window_end, a, b, c

When the non-determinism occurs, the topic is not keyed at all. When I key it 
by the attribute “a”, I get the incorrect, but deterministic results. Maybe in 
the second case, only 1 partition out of the 10 is consumed at once?

Best,
Theo

> On 13. Feb 2023, at 08:15, Theodor Wübker <theo.wueb...@inside-m2m.de> wrote
> 
> Hey Yuxia,
> 
> thanks for your response. I figured too, that the events arrive in a 
> (somewhat) random order and thus cause non-determinism. I used a Watermark 
> like this:"timeStampData - INTERVAL '10' SECOND” . Increasing the Watermark 
> Interval does not solve the problem though, the results are still not 
> deterministic. Instead I keyed the 10 partition topic. Now results are 
> deterministic, but they are incorrect (way too few). Am I doing something 
> fundamentally wrong? I just need the messages to be in somewhat in order 
> (just so they don’t violate the watermark). 
> 
> Best,
> Theo
> 
> (sent again, sorry, I previously only responded to you, not the Mailing list 
> by accident)
> 
>> On 13. Feb 2023, at 08:14, Theodor Wübker <theo.wueb...@inside-m2m.de 
>> <mailto:theo.wueb...@inside-m2m.de>> wrote:
>> 
>> Hey Yuxia,
>> 
>> thanks for your response. I figured too, that the events arrive in a 
>> (somewhat) random order and thus cause non-determinism. I used a Watermark 
>> like this: "timeStampData - INTERVAL '10' SECOND” . Increasing the Watermark 
>> Interval does not solve the problem though, the results are still not 
>> deterministic. Instead I keyed the 10 partition topic. Now results are 
>> deterministic, but they are incorrect (way too few). Am I doing something 
>> fundamentally wrong? I just need the messages to be in somewhat in order 
>> (just so they don’t violate the watermark). 
>> 
>> Best,
>> Theo
>> 
>>> On 13. Feb 2023, at 04:23, yuxia <luoyu...@alumni.sjtu.edu.cn 
>>> <mailto:luoyu...@alumni.sjtu.edu.cn>> wrote:
>>> 
>>> HI, Theo.
>>> I'm wondering what the Event-Time-Windowed Query you are using looks like.
>>> For example, how do you define the watermark?
>>> Considering you read records from the 10 partitions, and it may well that 
>>> the records will arrive the window process operator out of order. 
>>> Is it possible that the records exceed the watermark, but there're still 
>>> some records will arrive?
>>> 
>>> If that's the case, every time, the records used to calculate result may 
>>> well different and then result in non-determinism result.
>>> 
>>> Best regards,
>>> Yuxia
>>> 
>>> ----- 原始邮件 -----
>>> 发件人: "Theodor Wübker" <theo.wueb...@inside-m2m.de 
>>> <mailto:theo.wueb...@inside-m2m.de>>
>>> 收件人: "User" <user@flink.apache.org <mailto:user@flink.apache.org>>
>>> 发送时间: 星期日, 2023年 2 月 12日 下午 4:25:45
>>> 主题: Non-Determinism in Table-API with Kafka and Event Time
>>> 
>>> Hey everyone,
>>> 
>>> I experience non-determinism in my Table API Program at the moment and (as 
>>> a relatively unexperienced Flink and Kafka user) I can’t really explain to 
>>> myself why it happens. So, I have a topic with 10 Partitions and a bit of 
>>> Data on it. Now I run a simple SELECT * query on this, that moves some 
>>> attributes around and writes everything on another topic with 10 
>>> partitions. Then, on this topic I run a Event-Time-Windowed Query. Now I 
>>> experience Non-Determinism: The results of the windowed query differ with 
>>> every execution. 
>>> I thought this might be, because the SELECT query wrote the data to the 
>>> partitioned topic without keys. So I tried it again with the same key I 
>>> used for the original topic. It resulted in the exact same topic structure. 
>>> Now when I run the Event-Time-Windowed query, I get incorrect results (too 
>>> few result-entries). 
>>> 
>>> I have already read a lot of the Docs on this and can’t seem to figure it 
>>> out. I would much appreciate, if someone could shed a bit of light on this. 
>>> Is there anything in particular I should be aware of, when reading 
>>> partitioned topics and running an event time query on that? Thanks :)
>>> 
>>> 
>>> Best,
>>> Theo
>> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to