Hi Alex, Please see my comment here [1].
[1] https://lists.apache.org/thread/h5mv6ld4l2g4hsjszfdos9f365nh7ctf BR, G On Mon, Sep 2, 2024 at 11:02 AM Alex K. <flink.user...@gmail.com> wrote: > We have an issue where a savepoint file containing Kafka topic partitions > offsets is requested millions of times from AWS S3. This results in the > job crashing and then followed by a restart and crashing again. We have > tracked the high number of reads (~3 millions) to Kafka topic partitions > (~40k) multiplied by job parallelism (70 slots). We are using Flink > 1.19.0, KafkaSource and savepoints/checkpoints are stored in AWS S3. > > We increased the state.storage.fs.memory-threshold to 700kb, which results > in the Kafka topic partition offsets being written in the _metadata > savepoint file and implicitly eliminates the problem from above. Our topics > and partitions are increasing weekly so we will reach the > state.storage.fs.memory-threshold max value limit of 1mb soon. > > Is this behaviour expected and in such case could it be optimised by > reducing the high number of reads, by caching the file or by some other > configuration we are not aware of? > > Thank you > >