Re: Flink High Availability Data Cleanup

2025-02-05 Thread Chen Yang via user
Hi Zhanghao, Thanks for the quick response! My current restart strategy type is fixed-delay with 10 seconds delay as follows. I used the default restart strategy exponential-delaybefore, but see high pressure in the Kafka cluster during incidents. Do you know how long Flink will retain the HA meta

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Jean-Marc Paulin
I am still hoping that I am still good. I just read the savepoint to extract information (parallelism 1, and only 1 task manager) . I also know it has been created by a job using a HashMap backend. And I do not care about duplicates. I should still be good, right? from what I saw I never read any

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Gabor Somogyi
Hi Guys, We've just had an in-depth analysis and we think that removing that particular line causes correctness issues under some circumstances. Namely key duplicates can happen when multiple column families are processed at the same time. Not need to mention that it would cause multiple `readKey

KinesisStreamSource polling interval

2025-02-05 Thread Jeroen Verellen
Hi, I recently switched from FlinkKinesisConsumer to KinesisStreamsSource as advised in the docs: https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/connectors/datastream/kinesis/#kinesis-consumer Since this migration we have a lot more ReadProvisionedThroughputExceeded warnings in C

Re: Restore rocksDB from savepoint exception

2025-02-05 Thread Bjarke Tornager
Hi Gabor, Returning to your answer from a while ago. Since writing to you about handling 10s GB of state with Apache Flink I am now deploying jobs with multiple terabytes of state. The problems that I outlined in my initial email have been handled by doing some of the tuning that you suggested -

Potential Contributing Guides Improvements

2025-02-05 Thread Salva Alcántara
I was importing the Flink project into IntelliJ 2024.3 following this guide: https://nightlies.apache.org/flink/flink-docs-master/docs/flinkdev/ide_setup/#importing-flink but when I click on "Generate Sources and Update Folders" I get the following errors: ``` [INFO] Validation error: [ERROR] Fa

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Salva Alcántara
Thanks both for your work on this! On a related note, since Queryable State (QS) is going away soon, streamlining the State Processor API as much as possible makes a lot of sense. Are there any plans on a migration guide or something for users to adapt their QS observers (beyond the current docs)

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Gabor Somogyi
Hi Jean-Marc, Thanks for your time investment and to share the numbers, it's super helpful. Ping me any time when you have further info to share. About the numbers: 48 minutes for 6Gb is not good but not terrible. I've seen petabyte scale states so I'm pretty sure we need to go beyond... Since w

Re: How to read a savepoint fast without exploding the memory

2025-02-05 Thread Jean-Marc Paulin
Hi Gabor, I finally got to run that change through. I have a 6Gb savepoint I read and parse for reference. - HashMap reads it in 14 minutes (but requires 10 Gb of RAM) - RockDb with the patch reads it in 48 minutes (and requires less than 2Gb) - RockDb without the patch wasn't even halfway through