Re: Getting HEX from VARBINARY column in Flink SQL

2025-03-11 Thread Shengkai Fang
you can write a udf to cast the binary data to string as you wish. best, Shengkai Graham Johnson 于2025年3月7日周五 22:21写道: > Hi, > > When I use the sql-client to query data on a topic which has one column > with binary data in, the results show that column in hexadecimal format. Is > there a way fo

Re: Flink CDC to Paimon

2025-03-11 Thread Andrew Otto
In case this helps, here are the versions I was working with when I was experimenting with this last year: https://github.com/ottomata/flink-cdc-spike/blob/main/dockerfiles/build_env_1.17.env https://github.com/ottomata/flink-cdc-spike/blob/main/dockerfiles/download_dependencies.sh I was trying m

Re: Flink CDC to Paimon

2025-03-11 Thread Taher Koitawala
Hi All, I am facing issues finding a class Org.apache.flink.cdc.connectors.shaded.org.apache.kafka.connect.json.JsonConverter I have added 1. Flink-connector-base 1.18.1 Flink-connector-debezium 3.1.0 Flink-cdc-pipeline-connectors-values 3.1.0 Flink-cdc-base 3.1.0 Flink-cdc-pipeline-conn

Re: Does Flink serialize events between all operators?

2025-03-11 Thread Alexey Novakov via user
Hi Vadim, Yes, it does serialize objects between operators even if they run within the same Task Manager unless object-reuse configuration is on: https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#pipeline-object-reuse Using immutable data (which is one of the principal o

Data skew after keyBy (even with a good number of key groups)

2025-03-11 Thread Vararu, Vadim via user
Hello, I’ve got two tasks: * one reading from the source (parallelism 1) * second, a keyed function (parallelism 50) Having the max parallelism set to 1500 and the parallelism of 50, I expect the second task to have incoming data equally spread when distributing the keys to the key gro

Does Flink serialize events between all operators?

2025-03-11 Thread Vararu, Vadim via user
Hello, Does Flink serialize all the data when moving from one operator to another (even when there is no shuffling/hashing between them)? If yes, then, does it worth to have less operators doing more stuff instead of more granular operators? For instance, one flat map + one filter could be sub