isburmistrov commented on code in PR #24079:
URL: https://github.com/apache/flink/pull/24079#discussion_r1450546553


##########
flink-runtime/src/main/java/org/apache/flink/runtime/state/CompressibleFSDataOutputStream.java:
##########
@@ -41,6 +41,10 @@ public CompressibleFSDataOutputStream(
 
     @Override
     public long getPos() throws IOException {
+        // Underlying compression involves buffering, so the only way to 
report correct position is
+        // to flush the underlying stream. This lowers the effectivity of 
compression, but there is
+        // no other way, since the position is often used as a split point.
+        flush();

Review Comment:
   This seems to be more hidden, but as we stick with correct semantics of 
Stream interface should be addressed. I believe the following scenario should 
reveal it:
   
   - Create `CompressibleFSDataInputStream` with snappy
   - Seek to 1st item
   - Read it (will be read correctly)
   - Seek to say 5th item
   - Read it (I believe it will be read incorrectly)
   
   The main point is that the 2nd seek should be to anything except to 2nd 
element. 2nd element will be read correctly, but anything else not - due to 
Snappy stream reading data from the buffer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to