Sathish Kumar created HDFS-17600:
------------------------------------

             Summary: HDFS Balancer not honouring upgrade domain policy
                 Key: HDFS-17600
                 URL: https://issues.apache.org/jira/browse/HDFS-17600
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs
    Affects Versions: 3.1.1
            Reporter: Sathish Kumar


There are 3 upgrade domain namely up1, up2,up3 with 2 upgrade domains (up1,up2) 
policy having 5 DataNodes each and one upgrade domain (up3) having 4 DataNodes.

 

Though the upgrade domain having 5 DataNodes are balancing within but the 
upgrade domain policy with 4 DataNodes not honouring the same.

When running the balancer, the balancer copying the blocks from upgrade domain 
named up3 to up2 or up1. Example job run as below.

INFO balancer.Dispatcher: Successfully moved blk_2628472659_1554764988 with 
size=3305207 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866

INFO balancer.Dispatcher: Successfully moved blk_2830371192_1756682484 with 
size=107537592 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 
1.5.34.68:9866

INFO balancer.Dispatcher: Successfully moved blk_2712919527_1639220270 with 
size=1358289 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 
1.5.34.69:9866

INFO balancer.Dispatcher: Successfully moved blk_3018060755_1944407960 with 
size=22866627 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 
1.5.34.68:9866

INFO balancer.Dispatcher: Successfully moved blk_2528373000_1454657120 with 
size=5898128 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866

INFO balancer.Dispatcher: Successfully moved blk_2628472715_1554765044 with 
size=5254384 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866

INFO balancer.Dispatcher: Successfully moved blk_2876876269_1803191123 with 
size=15647542 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 
10.x.x.9:9866

INFO balancer.Dispatcher: Successfully moved blk_1144306578_70566613 with 
size=104746420 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 
10.x.x.9:9866

INFO balancer.Dispatcher: Successfully moved blk_2628470767_1554763096 with 
size=4183391 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866

INFO balancer.Dispatcher: Successfully moved blk_2628470533_1554762862 with 
size=3461325 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866

INFO balancer.Dispatcher: Successfully moved blk_2612635299_1538926325 with 
size=22033489 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 
10.x.x.9:9866

Here the node with IP 10.x.x.9 belongs to the up3 upgrade domain and node with 
IP 10.x.x.8 belongs to up2 upgrade domain due to which the copied block treats 
like an excess replica and will be deleted from up2 domain causing the balancer 
not to do the balancing properly.

The only workaround to exclude the other upgrade domains (up2 and up1) nodes in 
the exclude list and run the balancer which balance within the upgrade domain 
of up3

On further search it looks to be below Jira related to this issue though it’s 
fixed.

 

https://issues.apache.org/jira/browse/HDFS-9007



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to