Sathish Kumar created HDFS-17600: ------------------------------------ Summary: HDFS Balancer not honouring upgrade domain policy Key: HDFS-17600 URL: https://issues.apache.org/jira/browse/HDFS-17600 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.1.1 Reporter: Sathish Kumar
There are 3 upgrade domain namely up1, up2,up3 with 2 upgrade domains (up1,up2) policy having 5 DataNodes each and one upgrade domain (up3) having 4 DataNodes. Though the upgrade domain having 5 DataNodes are balancing within but the upgrade domain policy with 4 DataNodes not honouring the same. When running the balancer, the balancer copying the blocks from upgrade domain named up3 to up2 or up1. Example job run as below. INFO balancer.Dispatcher: Successfully moved blk_2628472659_1554764988 with size=3305207 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866 INFO balancer.Dispatcher: Successfully moved blk_2830371192_1756682484 with size=107537592 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 1.5.34.68:9866 INFO balancer.Dispatcher: Successfully moved blk_2712919527_1639220270 with size=1358289 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 1.5.34.69:9866 INFO balancer.Dispatcher: Successfully moved blk_3018060755_1944407960 with size=22866627 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 1.5.34.68:9866 INFO balancer.Dispatcher: Successfully moved blk_2528373000_1454657120 with size=5898128 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866 INFO balancer.Dispatcher: Successfully moved blk_2628472715_1554765044 with size=5254384 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866 INFO balancer.Dispatcher: Successfully moved blk_2876876269_1803191123 with size=15647542 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866 INFO balancer.Dispatcher: Successfully moved blk_1144306578_70566613 with size=104746420 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866 INFO balancer.Dispatcher: Successfully moved blk_2628470767_1554763096 with size=4183391 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866 INFO balancer.Dispatcher: Successfully moved blk_2628470533_1554762862 with size=3461325 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866 INFO balancer.Dispatcher: Successfully moved blk_2612635299_1538926325 with size=22033489 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866 Here the node with IP 10.x.x.9 belongs to the up3 upgrade domain and node with IP 10.x.x.8 belongs to up2 upgrade domain due to which the copied block treats like an excess replica and will be deleted from up2 domain causing the balancer not to do the balancing properly. The only workaround to exclude the other upgrade domains (up2 and up1) nodes in the exclude list and run the balancer which balance within the upgrade domain of up3 On further search it looks to be below Jira related to this issue though it’s fixed. https://issues.apache.org/jira/browse/HDFS-9007 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org