purplefox commented on a change in pull request #8812:
URL: https://github.com/apache/kafka/pull/8812#discussion_r583496197



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -826,8 +832,16 @@ class Log(@volatile private var _dir: File,
         preallocate = config.preallocate))
     }
 
-    recoveryPoint = activeSegment.readNextOffset

Review comment:
       Is it possible that a consumer could see "phantom" messages after 
recovery, even with this change?
   
   1. Kafka Process dies with log data in page cache but not fsync'd
   2. Recovery process sees the un-fsync'd log data but it looks ok so recovery 
succeeds, nothing to do.
   3. Consumer fetches this data
   4. OS hard dies, losing page cache
   5. Consumer tries to repeat fetch from same offset but data has gone.
   
   It seems to me once recovery has run we should be sure that all log segments 
are persistently stored. I'm not sure if we're currently providing that 
guarantee.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to