ankitsol commented on PR #7617:
URL: https://github.com/apache/hbase/pull/7617#issuecomment-3996883509

   @Apache9 @anmolnar 
   
   I have now added a periodic shipper monitoring task in ReplicationSource 
(`shipperMonitorExecutor`) which checks for dead (non-FINISHED) shipper threads 
and recreates them. When the new shipper starts, it reads the replication 
offset from the persisted offset storage and resumes WAL reading from that 
point, so any WAL entries whose offset was not successfully persisted will be 
replicated again.
   
   So now if the endpoint (for example an S3-based implementation) fails during 
`beforePersistingReplicationOffset()` while committing/flushing data, 
`persistLogPosition()` throws an `IOException`. In `shipEdits()` this 
`IOException` is caught and rethrown as a `ReplicationRuntimeException`. This 
causes the ReplicationSourceShipper worker thread to exit (the run() method 
interrupts and terminates the thread). The monitor then detects the dead worker 
and recreates the shipper, which resumes replication from the last persisted 
offset.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to