Ekanth Sethuramalingam created HDFS-14317:
---------------------------------------------
Summary: Standby does not trigger edit log rolling when
in-progress edit log tailing is enabled
Key: HDFS-14317
URL: https://issues.apache.org/jira/browse/HDFS-14317
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 3.0.0, 2.9.0
Reporter: Ekanth Sethuramalingam
Assignee: Ekanth Sethuramalingam
The standby uses the following method to check if it is time to trigger edit
log rolling on active.
{{/**}}
\{{ * @return true if the configured log roll period has elapsed.}}
\{{ */}}
{{private boolean tooLongSinceLastLoad() {}}
\{{ return logRollPeriodMs >= 0 && }}
{{ (monotonicNow() - lastLoadTimeMs) > logRollPeriodMs ;}}
{{}}}
In doTailEdits(), lastLoadTimeMs is updated when standby is able to
successfully tail any edits
{{if (editsLoaded > 0) {}}
{{ lastLoadTimeMs = monotonicNow();}}
{{}}}
The default configuration for {{dfs.ha.log-roll.period}} is 120 seconds and
{{dfs.ha.tail-edits.period}} is 60 seconds. With in-progress edit log tailing
enabled, tooLongSinceLastLoad() will almost never return true resulting in edit
logs not rolled for a long time until this configuration
{{dfs.namenode.edit.log.autoroll.multiplier.threshold}} takes effect.
[In our deployment, this resulted in in-progress edit logs getting deleted. The
sequence of events is that standby was able to checkpoint twice while the
in-progress edit log was growing on active. When the NNStorageRetentionManager
decided to cleanup old checkpoints and edit logs, it cleaned up the in-progress
edit log from active and QJM (as the txnid on in-progress edit log was older
than the 2 most recent checkpoints) resulting in irrecoverably losing a few
minutes worth of metadata].
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]