Thanks for your explanation. The execute plan for the sql `INSERT INTO
print_table SELECT * FROM ( SELECT RandomUdf(`id`) AS `id_in_bytes`, `id` FROM
datagenTable ) AS ET WHERE ET.`id_in_bytes` IS NOT NULL` is :
`
StreamPhysicalSink(table=[default_catalog.default_database.print_table],
fields=
Ok. The datagen with sequence option can produce this issue easily, and it
also resulted in an incorrect result. I have a sequence generated by
datagen that starts from 1 to 5 and let the UDF randomly either return null
or bytes. Surprisingly, not only the UDF has been executed twice but also
the w
The dategen may produce rows with same values.
>From my side, in Flink, the udf shouldn't process one row for twice,
>otherwise, it should be a critical bug.
Best regards,
Yuxia
发件人: "Xinyi Yan"
收件人: "User"
发送时间: 星期四, 2022年 11 月 03日 上午 6:59:20
主题: Question about UDF randomly processed
Hi Flink User Group,
I am getting error on using the TCP startup probe. Using the port 6123.
Error Msg: "Startup probe failed: dial tcp xx.x.x.x:6123: connect: connection
refused"
please let us know if we can use this port 6123 for Job-Manager and 6122 for
Task-Manager for the startup probe,
Martin, as I said, the problem is with GC, the network issue is just a
symptom.
I just wanted to say that after a lot of troubleshooting which didn't
achieve any insight we decided to use YARN Node Labels feature to run the
job only on Google Dataproc's secondary workers. The problem went away
com
Hi Yanfei
Thanks for the explanation.
If I use reduce in the context of keyed stream with window, am I right to think
that there will be 1 reduce function per key, and they will never overlap? Each
reduce function instance will only receive elements from the same key in order.
From: Yanfei Lei