Yes, I think shutdown handler should not verify machine that is included deadservers set.
2011-07-28 22:25:09,336 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Waiting for split writer threads to finish 2011-07-28 22:25:09,450 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Split writers finished 2011-07-28 22:25:09,602 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path hdfs://158.1.101.82:9000/hbase/.META./1028785192/recovered.edits/0000000000000025786 (wrote 121 edits in 2567ms) 2011-07-28 22:25:09,860 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path hdfs://158.1.101.82:9000/hbase/ufdr5/745550eb514e441f31ff26dbde8402ae/recovered.edits/0000000000000617740 (wrote 211642 edits in 141887ms) //split logs finished and assigned root table firstly .at the same time, region server came out of GC, verifying root passed. 2011-07-28 22:25:10,085 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting completed in 316329 ms for hdfs://158.1.101.82:9000/hbase/.logs/158-1-101-82,20020,1311885942386 // region server is rejected and region server will shutdown itself. 2011-07-28 22:28:30,577 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server REPORT rejected; currently processing 158-1-101-82,20020,1311885942386 as dead server 2011-07-28 22:28:37,591 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x23171e103d10018 Creating (or updating) unassigned node for 1028785192 with OFFLINE state 2011-07-28 22:28:37,704 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for .META.,,1.1028785192 so generated a random one; hri=.META.,,1.1028785192, src=, dest=158-1-101-202,20020,1311878322145; 2 (online=2, exclude=null) available servers 2011-07-28 22:28:37,704 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region .META.,,1.1028785192 to 158-1-101-202,20020,1311878322145 2011-07-28 22:28:37,733 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=158-1-101-82:20000, region=1028785192/.META. all logs is in attachment. -----邮件原件----- 发件人: [email protected] [mailto:[email protected]] 代表 Stack 发送时间: 2011年8月16日 12:33 收件人: [email protected] 主题: Re: Root table couldn't be opened On Mon, Aug 15, 2011 at 9:23 PM, Gaojinchao <[email protected]> wrote: > Why did the master replay its logs if it did not exit? Sorry. Which logs? > Zk is expired because of gc. But region server isn't shutdown. > Right, but it probably went down soon after it came out of GC, right? > (I like how you noticed the log message that says 82 has root and meta) > > Added=158-1-101-82,20020,1311885942386 to dead servers, submitted shutdown > handler to be executed, root=true, meta=true > It said that 82 has root and meta. "root=true" shows the dead region server > has root table. > So, you think there is a bug in our shutdown handler where we are not doing -ROOT- processing properly? St.Ack
