Kihwal Lee created HDFS-9678: -------------------------------- Summary: Standby NN sometimes does not clear needRollbackFsImage Key: HDFS-9678 URL: https://issues.apache.org/jira/browse/HDFS-9678 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee
When the edit log loader sees {{OP_ROLLING_UPGRADE_START}}, it calls {{setNeedRollbackFsImage(true)}}. This is cleared on a standby NN only by the checkpointer thread when it actually creates a rollback image. On {{OP_ROLLING_UPGRADE_FINALIZE}}, the rolling upgrade is finalized, but {{needRollbackFsImage}} is not cleared, if a rollback image was never created. This result in perpetual checkpointing by the standby NN. The standby NN thinks it needs to do chekpointing because it needs to create a rollback image, but since it is not in upgrade mode, it creates a regular checkpoint, not a rollback image. As a result, the status is not cleared even after creating checkpoint. The standby will keep checkpointing back-to-back and they will get uploaded to the active constantly. We noticed this because of increased sync time on the active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)