Re: [PR] [FLINK-36530][state] Fix S3 performance issue with uncompressed state restore [flink]

via GitHub Thu, 17 Oct 2024 10:48:40 -0700


mateczagany commented on PR #25509:
URL: https://github.com/apache/flink/pull/25509#issuecomment-2420140089


   > Are you saying that the exact same state data with default S3 Hadoop 
configs is slow uncompressed and fast compressed? That would be better case.
   
   Yes, that's correct, I did not tune any S3 settings. The job was the same, 
compressed state size was 290 MB, uncompressed one was 11 MB. The recovery of 
uncompressed state resulted in one S3 `GET` after each `read()` after `skip()` 
was called, while the compressed data did not. 
   
   It was a ListState of Strings, and for the compressed data I even induced an 
artificial `skip()` for each element by reading 1 less byte in 
`StringValue#read()`, and none of the `skip()` or `read()` calls resulted in 
any new S3 `GET` queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-36530][state] Fix S3 performance issue with uncompressed state restore [flink]

Reply via email to