isburmistrov commented on code in PR #24079:
URL: https://github.com/apache/flink/pull/24079#discussion_r1450346151


##########
flink-runtime/src/main/java/org/apache/flink/runtime/state/CompressibleFSDataOutputStream.java:
##########
@@ -41,6 +41,10 @@ public CompressibleFSDataOutputStream(
 
     @Override
     public long getPos() throws IOException {
+        // Underlying compression involves buffering, so the only way to 
report correct position is
+        // to flush the underlying stream. This lowers the effectivity of 
compression, but there is
+        // no other way, since the position is often used as a split point.
+        flush();

Review Comment:
   Also, FYI, read operation in `SnappyFramedInputStream` also does the 
buffering:
   
   ```
   @Override
   public int read()
           throws IOException
   {
       if (closed) {
           return -1;
       }
       if (!ensureBuffer()) {
           return -1;
       }
       return buffer[position++] & 0xFF;
   }
   ```
   
   So when we do `seek` in the `CompressibleFSDataInputStream` (which gets 
delegated to `delegate`) and then do `read` (which gets delegated to 
`compressingDelegate`) - I guess something weird can happen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to