ankitsol commented on PR #7617: URL: https://github.com/apache/hbase/pull/7617#issuecomment-3996883509
@Apache9 @anmolnar I have now added a periodic shipper monitoring task in ReplicationSource (`shipperMonitorExecutor`) which checks for dead (non-FINISHED) shipper threads and recreates them. When the new shipper starts, it reads the replication offset from the persisted offset storage and resumes WAL reading from that point, so any WAL entries whose offset was not successfully persisted will be replicated again. So now if the endpoint (for example an S3-based implementation) fails during `beforePersistingReplicationOffset()` while committing/flushing data, `persistLogPosition()` throws an `IOException`. In `shipEdits()` this `IOException` is caught and rethrown as a `ReplicationRuntimeException`. This causes the ReplicationSourceShipper worker thread to exit (the run() method interrupts and terminates the thread). The monitor then detects the dead worker and recreates the shipper, which resumes replication from the last persisted offset. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
