[ https://issues.apache.org/jira/browse/KUDU-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-3453: -------------------------------- Description: Tablet copying is a provision to implement the process of automatic tablet re-replication in Kudu. When the system catalog (Kudu master) detects that a tablet replica is no longer available, it automatically re-replicates a tablet to a destination tablet server using another healthy tablet replica in the cluster as the source. When copying a tablet from one tablet server to another, the source tablet copying session "anchors" WAL segments to be transferred to the destination server, so they are not GC-ed by the tablet maintenance operation when they are no longer needed locally, but the tablet copy session is still in progress. The anchored WAL segments are released all at once when the tablet copying session completes with success of failure. However, there might be long running tablet copying sessions, and with high data ingest rate, the source tablet replica might accumulate huge amount of WAL data which isn't relevant at both the source and the destination server. To prevent accumulation of WAL data for long-running tablet copying sessions, it's necessary to update the WAL anchors in a more granular manner, e.g. un-anchor a segment once it has been successfully copied and persisted by the client tablet copying session. was: Tablet copying is a provision to implement the process of automatic tablet re-replication in Kudu. When the system catalog (Kudu master) detects that a tablet replica is no longer available, it automatically re-replicates a tablet to a destination tablet server using another healthy tablet replica in the cluster as the source. When copying a tablet from one tablet server to another, the source tablet copying session "anchors" WAL segments to be transfered to the destination server, so they are not GC-ed by the tablet maintenance operation when they are no longer needed locally, but the tablet copy session is still in progress. The anchored WAL segments are releases all at once when the tablet copying session completes with success of failure. However, there might be long running tablet copying sessions, and with high data ingest rate, the source tablet replica might accumulate huge amount of WAL data which isn't relevant at both the source and the destination server. To prevent accumulation of WAL data for long-running tablet copying sessions, it's necessary to update the WAL anchors in a more granular manner, e.g. un-anchor a segment once it has been successfully copied and persisted by the client tablet copying session. > Fine-grained anchoring for WAL segments for tablet copy > ------------------------------------------------------- > > Key: KUDU-3453 > URL: https://issues.apache.org/jira/browse/KUDU-3453 > Project: Kudu > Issue Type: Improvement > Components: tablet, tserver > Reporter: Alexey Serbin > Priority: Major > > Tablet copying is a provision to implement the process of automatic tablet > re-replication in Kudu. When the system catalog (Kudu master) detects that a > tablet replica is no longer available, it automatically re-replicates a > tablet to a destination tablet server using another healthy tablet replica in > the cluster as the source. > When copying a tablet from one tablet server to another, the source tablet > copying session "anchors" WAL segments to be transferred to the destination > server, so they are not GC-ed by the tablet maintenance operation when they > are no longer needed locally, but the tablet copy session is still in > progress. > The anchored WAL segments are released all at once when the tablet copying > session completes with success of failure. However, there might be long > running tablet copying sessions, and with high data ingest rate, the source > tablet replica might accumulate huge amount of WAL data which isn't relevant > at both the source and the destination server. > To prevent accumulation of WAL data for long-running tablet copying sessions, > it's necessary to update the WAL anchors in a more granular manner, e.g. > un-anchor a segment once it has been successfully copied and persisted by the > client tablet copying session. -- This message was sent by Atlassian Jira (v8.20.10#820010)