Takanobu Asanuma created HDFS-11179: ---------------------------------------
Summary: LightWeightHashSet can't remove blocks correctly which have a large number blockId Key: HDFS-11179 URL: https://issues.apache.org/jira/browse/HDFS-11179 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0-alpha1 Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma Priority: Blocker Our test cluster has faced a problem that {{postponedMisreplicatedBlocksCount}} has been going below zero. The version of the cluster is a recent 3.0. We haven't created any EC files yet. This is the NN's log: {noformat} Rescan of postponedMisreplicatedBlocks completed in 13 msecs. 448 blocks are left. 176 blocks are removed. Rescan of postponedMisreplicatedBlocks completed in 13 msecs. 272 blocks are left. 176 blocks are removed. Rescan of postponedMisreplicatedBlocks completed in 14 msecs. 96 blocks are left. 176 blocks are removed. Rescan of postponedMisreplicatedBlocks completed in 327 msecs. -77 blocks are left. 177 blocks are removed. Rescan of postponedMisreplicatedBlocks completed in 15 msecs. -253 blocks are left. 179 blocks are removed. Rescan of postponedMisreplicatedBlocks completed in 14 msecs. -432 blocks are left. 179 blocks are removed. {noformat} I looked into this issue and found that it is caused by {{LightWeightHashSet}} which is used for {{postponedMisreplicatedBlocks}} recently. When {{LightWeightHashSet}} remove blocks which have a large number blockId, overflows happen and the blocks can't be removed correctly(, let alone ec blocks whose blockId starts with the minimum of long). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org