question regarding metadatahandlerProvider

2025-01-09 Thread Antonio Si
Hi, I am using apache-flink 1.19 and python 3.11. I have a very simple batch job which registers a source table using CREATE TABLE … and output to a sink table, another CREATE TABLE …. Before I output to sink table, I run a dedup query SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY

Re: question on dataSource.collect() on reading states from a savepoint file

2022-02-10 Thread Antonio Si
nformation about s3 dataset sink > > Regards, > > > Le jeu. 10 févr. 2022 à 20:52, Antonio Si a écrit : > >> Thanks Bastien. Can you point to an example of using a sink as we are >> planning to write to S3? >> >> Thanks again for your help. >> >> Antoni

Re: question on dataSource.collect() on reading states from a savepoint file

2022-02-10 Thread Antonio Si
if it's too big, into something splittable > > Regards, > Bastien > > -- > > Bastien DINE > Data Architect / Software Engineer / Sysadmin > bastiendine.io > > > Le jeu. 10 févr. 2022 à 20:32, Antonio Si a écrit : > >> Hi, >> >> I

question on dataSource.collect() on reading states from a savepoint file

2022-02-10 Thread Antonio Si
Hi, I am using the stateful processing api to read the states from a savepoint file. It works fine when the state size is small, but when the state size is larger, around 11GB, I am getting an OOM. I think it happens when it is doing a dataSource.collect() to obtain the states. The stackTrace is c