Re: question on dataSource.collect() on reading states from a savepoint file

2022-02-10 Thread Antonio Si
Thanks Bastien. I will check it out. Antonio. On Thu, Feb 10, 2022 at 11:59 AM bastien dine wrote: > I haven't used s3 with Flink, but according to this doc : > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/filesystems/s3/ > You can setup pretty easily s3 and use it

Re: question on dataSource.collect() on reading states from a savepoint file

2022-02-10 Thread bastien dine
I haven't used s3 with Flink, but according to this doc : https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/filesystems/s3/ You can setup pretty easily s3 and use it with s3://path/to/your/file with a write sink The page talk about DataStream but it should work with DataSet

Re: question on dataSource.collect() on reading states from a savepoint file

2022-02-10 Thread Antonio Si
Thanks Bastien. Can you point to an example of using a sink as we are planning to write to S3? Thanks again for your help. Antonio. On Thu, Feb 10, 2022 at 11:49 AM bastien dine wrote: > Hello Antonio, > > .collect() method should be use with caution as it's collecting the > DataSet (multiple

Re: question on dataSource.collect() on reading states from a savepoint file

2022-02-10 Thread bastien dine
Hello Antonio, .collect() method should be use with caution as it's collecting the DataSet (multiple partitions on multiple TM) into a List single list on JM (so in memory) Unless you have a lot of RAM, you can not use it this way and you probably should not I recommend you to use a sink to print

question on dataSource.collect() on reading states from a savepoint file

2022-02-10 Thread Antonio Si
Hi, I am using the stateful processing api to read the states from a savepoint file. It works fine when the state size is small, but when the state size is larger, around 11GB, I am getting an OOM. I think it happens when it is doing a dataSource.collect() to obtain the states. The stackTrace is c