isburmistrov commented on code in PR #24079: URL: https://github.com/apache/flink/pull/24079#discussion_r1450346151
########## flink-runtime/src/main/java/org/apache/flink/runtime/state/CompressibleFSDataOutputStream.java: ########## @@ -41,6 +41,10 @@ public CompressibleFSDataOutputStream( @Override public long getPos() throws IOException { + // Underlying compression involves buffering, so the only way to report correct position is + // to flush the underlying stream. This lowers the effectivity of compression, but there is + // no other way, since the position is often used as a split point. + flush(); Review Comment: Also, FYI, read operation in `SnappyFramedInputStream` also does the buffering: ``` @Override public int read() throws IOException { if (closed) { return -1; } if (!ensureBuffer()) { return -1; } return buffer[position++] & 0xFF; } ``` So when we do `seek` in the `CompressibleFSDataInputStream` (which gets delegated to `delegate`) and then do `read` (which gets delegated to `compressingDelegate`) - I guess something weird can happen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org