gaborgsomogyi commented on PR #25509: URL: https://github.com/apache/flink/pull/25509#issuecomment-2419716967
> Could you reproduce this issue with state compression enabled? I've slightly touched compressed state and yeah, seen either slowness and/or huge amount of re-opens. The number of re-opens are controlled by (which I've played with to reduce re-opens): * `fs.s3a.readahead.range` * `fs.s3a.input.async.drain.threshold` Plus tried to enable pre-fetching via `fs.s3a.prefetch.enabled` to do everything in memory to gain some speed. None of them helped. My finding is that `skip` in case of S3 is reading the data into a buffer and then just drop it. I've not gone through the whole snappy chain but assumed the same happens with decompression at top. Not sure from where the mentioned `4096 bytes` buffer size is coming from but having as many re-opens as many elements are the list is also something which is not optimal🙂 Are you saying that the exact same state data with default S3 Hadoop configs is slow uncompressed and fast compressed? That would be better case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org