[ https://issues.apache.org/jira/browse/HDFS-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HDFS-871. ----------------------------------- Resolution: Fixed likely stale. > Balancer can hang in PendingBlockMove > ------------------------------------- > > Key: HDFS-871 > URL: https://issues.apache.org/jira/browse/HDFS-871 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer > Affects Versions: 0.20.1 > Environment: Yahoo 0.20 > Reporter: Andrew Ryan > Attachments: balancer-jstack.out > > > We started the balancer, with default options (-threshold 10), and it ran > fine for a few hours, then hung. The process was still alive but no balancing > was taking place. > At the time of the hang, jstack showed there were three threads in RUNNABLE > status. Subsequent jstacks taken minutes and hours later showed the same > three threads running in the same place, so I don't think this was a case > where requests were being restarted, it looks like hangs. My best guess is, > there's no timeout in the request to the namenode for these requests, and > there needs to be. > I'll attach the full jstack output, but here's a sample thread, they are all > stuck in the same place. > "pool-1-thread-972" prio=10 tid=0x00002aaafc23a800 nid=0x27a8 runnable > [0x00002a > ab0a9a2000] > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > - locked <0x00002aaaebdbe158> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readShort(DataInputStream.java:295) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.receiveResponse(Balancer.java:371) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.dispatch(Balancer.java:326) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.access$1800(Balancer.java:232) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove$1.run(Balancer.java:393) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) -- This message was sent by Atlassian JIRA (v6.2#6252)