dmvk commented on code in PR #24079:
URL: https://github.com/apache/flink/pull/24079#discussion_r1450722556


##########
flink-runtime/src/main/java/org/apache/flink/runtime/state/CompressibleFSDataOutputStream.java:
##########
@@ -41,6 +41,10 @@ public CompressibleFSDataOutputStream(
 
     @Override
     public long getPos() throws IOException {
+        // Underlying compression involves buffering, so the only way to 
report correct position is
+        // to flush the underlying stream. This lowers the effectivity of 
compression, but there is
+        // no other way, since the position is often used as a split point.
+        flush();

Review Comment:
   Your partially right. One missing point is that snappy always buffers at 
most until the next flush, so as long as we always read full record, it should 
work as expected. In case of partial reads, it works as you describe.
   
   I've tried to fix the partial read case in 
https://github.com/apache/flink/pull/24079/commits/4a42f16a13f3d22c188a5020a8e8718394d11e0c,
 PTAL



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to