dmvk commented on code in PR #24079: URL: https://github.com/apache/flink/pull/24079#discussion_r1450722556
########## flink-runtime/src/main/java/org/apache/flink/runtime/state/CompressibleFSDataOutputStream.java: ########## @@ -41,6 +41,10 @@ public CompressibleFSDataOutputStream( @Override public long getPos() throws IOException { + // Underlying compression involves buffering, so the only way to report correct position is + // to flush the underlying stream. This lowers the effectivity of compression, but there is + // no other way, since the position is often used as a split point. + flush(); Review Comment: Your partially right. One missing point is that snappy always buffers at most until the next flush, so as long as we always read full record, it should work as expected. In case of partial reads, it works as you describe. I've tried to fix the partial read case in https://github.com/apache/flink/pull/24079/commits/4a42f16a13f3d22c188a5020a8e8718394d11e0c, PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org