In my cluster(version 0.90.3) , The root table couldn't be opened when one 
region server crashed because of gc.

The logs show:

// Master assigned the root table to 82
2011-07-28 21:34:34,710 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
-ROOT-,,0.70236052 on 158-1-101-82,20020,1311885942386

//The host of 82 crashed, master finished the split log and reassigned the root 
and meta. But the region server didn't exit. So the root verified is passed.
 I think we shouldn't verify the root / meta in shutdownhandler processing

2011-07-28 22:19:53,746 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Added=158-1-101-82,20020,1311885942386 to dead servers, submitted shutdown 
handler to be executed, root=true, meta=true
2011-07-28 22:25:10,085 INFO 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting 
completed in 316329 ms for 
hdfs://158.1.101.82:9000/hbase/.logs/158-1-101-82,20020,1311885942386
2011-07-28 22:26:54,790 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
running balancer because processing dead regionserver(s): 
[158-1-101-82,20020,1311885942386]
2011-07-28 22:27:11,176 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
Failed scan of catalog table
java.net.SocketTimeoutException: Call to 158-1-101-82/158.1.101.82:20020 failed 
on socket timeout exception: java.net.SocketTimeoutException: 60000 millis 
timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/158.1.101.82:57428 
remote=158-1-101-82/158.1.101.82:20020]
         at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802)
         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
         at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
         at $Proxy6.getRegionInfo(Unknown Source)
         at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426)
         at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:273)
         at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:333)
         at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366)
         at 
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255)
         at 
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:237)
         at 
org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:139)
         at 
org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:88)
         at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/158.1.101.82:57428 
remote=158-1-101-82/158.1.101.82:20020]
         at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
         at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
         at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
         at java.io.FilterInputStream.read(FilterInputStream.java:116)
         at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299)
         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
         at java.io.DataInputStream.readInt(DataInputStream.java:370)
         at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
         at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
2011-07-28 22:28:30,577 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Server REPORT rejected; currently processing 158-1-101-82,20020,1311885942386 
as dead server
2011-07-28 22:28:37,591 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:20000-0x23171e103d10018 Creating (or updating) unassigned node for 
1028785192 with OFFLINE state
2011-07-28 22:28:37,704 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
No previous transition plan was found (or we are ignoring an existing plan) for 
.META.,,1.1028785192 so generated a random one; hri=.META.,,1.1028785192, src=, 
dest=158-1-101-202,20020,1311878322145; 2 (online=2, exclude=null) available 
servers
2011-07-28 22:28:37,704 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region .META.,,1.1028785192 to 158-1-101-202,20020,1311878322145
2011-07-28 22:28:37,733 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=M_ZK_REGION_OFFLINE, server=158-1-101-82:20000, 
region=1028785192/.META.
2011-07-28 22:28:37,766 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=158-1-101-202,20020,1311878322145, region=1028785192/.META.



Region server logs:
2011-07-28 22:19:17,389 DEBUG 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested 
for ufdr7,,1311890810267.c1b556627c511076bafbb1b802589cb6. because 
regionserver20020.cacheFlusher; priority=4, compaction queue size=1

// blocked for a long time.
2011-07-28 22:28:24,829 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 552455ms for sessionid 
0x13171e103d7003e, closing socket connection and attempting reconnect
2011-07-28 22:28:24,829 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 552455ms for sessionid 
0x23171e103d10020, closing socket connection and attempting reconnect
2011-07-28 22:28:25,186 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server /158.1.101.222:2181
2011-07-28 22:28:25,838 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server /158.1.101.82:2181

Reply via email to