Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread Theodor Wübker
Hey, so one more thing, the query looks like this: SELECT window_start, window_end, a, b, c, count(*) as x FROM TABLE(TUMBLE(TABLE data.v1, DESCRIPTOR(timeStampData), INTERVAL '1' HOUR)) GROUP BY window_start, window_end, a, b, c When the non-determinism occurs, the topic is not keyed at all.

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread Theodor Wübker
Hey Yuxia, thanks for your response. I figured too, that the events arrive in a (somewhat) random order and thus cause non-determinism. I used a Watermark like this:"timeStampData - INTERVAL '10' SECOND” . Increasing the Watermark Interval does not solve the problem though, the results are stil

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread yuxia
HI, Theo. I'm wondering what the Event-Time-Windowed Query you are using looks like. For example, how do you define the watermark? Considering you read records from the 10 partitions, and it may well that the records will arrive the window process operator out of order. Is it possible that the re

Re: Flink Hudi HMS Catalog problem

2023-02-12 Thread yuxia
HI, Flink provides HiveCatalog which can store native Hive table and other type Flink table(more exactly, a DDL mapping), with which, Flink can access Hive table and other Flink tables. Does it meet you requirement? Best regards, Yuxia 发件人: "melin li" 收件人: "User" 发送时间: 星期三, 2023年 2 月

Re: Seeking suggestions for ingesting large amount of data from S3

2023-02-12 Thread yuxia
Hi, Eric. Thanks for reaching out. I'm wondering how do you use the Table API to ingest the data. Since the OOM is too general, do you have any clue for OOM? May be you can use jmap to what occupy the most of memory. If find, you can try to figure out what's the reason, is it cause by memory l

Non-Determinism in Table-API with Kafka and Event Time

2023-02-12 Thread Theodor Wübker
Hey everyone, I experience non-determinism in my Table API Program at the moment and (as a relatively unexperienced Flink and Kafka user) I can’t really explain to myself why it happens. So, I have a topic with 10 Partitions and a bit of Data on it. Now I run a simple SELECT * query on this, t