Ekanth Sethuramalingam created HDFS-14317: ---------------------------------------------
Summary: Standby does not trigger edit log rolling when in-progress edit log tailing is enabled Key: HDFS-14317 URL: https://issues.apache.org/jira/browse/HDFS-14317 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.9.0 Reporter: Ekanth Sethuramalingam Assignee: Ekanth Sethuramalingam The standby uses the following method to check if it is time to trigger edit log rolling on active. {{/**}} \{{ * @return true if the configured log roll period has elapsed.}} \{{ */}} {{private boolean tooLongSinceLastLoad() {}} \{{ return logRollPeriodMs >= 0 && }} {{ (monotonicNow() - lastLoadTimeMs) > logRollPeriodMs ;}} {{}}} In doTailEdits(), lastLoadTimeMs is updated when standby is able to successfully tail any edits {{if (editsLoaded > 0) {}} {{ lastLoadTimeMs = monotonicNow();}} {{}}} The default configuration for {{dfs.ha.log-roll.period}} is 120 seconds and {{dfs.ha.tail-edits.period}} is 60 seconds. With in-progress edit log tailing enabled, tooLongSinceLastLoad() will almost never return true resulting in edit logs not rolled for a long time until this configuration {{dfs.namenode.edit.log.autoroll.multiplier.threshold}} takes effect. [In our deployment, this resulted in in-progress edit logs getting deleted. The sequence of events is that standby was able to checkpoint twice while the in-progress edit log was growing on active. When the NNStorageRetentionManager decided to cleanup old checkpoints and edit logs, it cleaned up the in-progress edit log from active and QJM (as the txnid on in-progress edit log was older than the 2 most recent checkpoints) resulting in irrecoverably losing a few minutes worth of metadata]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org