isburmistrov commented on code in PR #24079: URL: https://github.com/apache/flink/pull/24079#discussion_r1450546553
########## flink-runtime/src/main/java/org/apache/flink/runtime/state/CompressibleFSDataOutputStream.java: ########## @@ -41,6 +41,10 @@ public CompressibleFSDataOutputStream( @Override public long getPos() throws IOException { + // Underlying compression involves buffering, so the only way to report correct position is + // to flush the underlying stream. This lowers the effectivity of compression, but there is + // no other way, since the position is often used as a split point. + flush(); Review Comment: This seems to be more hidden, but as we stick with correct semantics of Stream interface should be addressed. I believe the following scenario should reveal it: - Create `CompressibleFSDataInputStream` with snappy - Seek to 1st item - Read it (will be read correctly) - Seek to say 5th item - Read it (I believe it will be read incorrectly) The main point is that the 2nd seek should be to anything except to 2nd element. 2nd element will be read correctly, but anything else not - due to Snappy stream reading data from the buffer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org