I think this is hbase-4168 whose fix got committed today. 
We run 0.90.4 patched with hbase-4168. 

Please try the patch. 

On Aug 10, 2011, at 7:05 PM, Gaojinchao <[email protected]> wrote:

> In my cluster(version 0.90.3) , The root table couldn't be opened when one 
> region server crashed because of gc.
> 
> The logs show:
> 
> // Master assigned the root table to 82
> 2011-07-28 21:34:34,710 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> -ROOT-,,0.70236052 on 158-1-101-82,20020,1311885942386
> 
> //The host of 82 crashed, master finished the split log and reassigned the 
> root and meta. But the region server didn't exit. So the root verified is 
> passed.
> I think we shouldn't verify the root / meta in shutdownhandler processing
> 
> 2011-07-28 22:19:53,746 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
> Added=158-1-101-82,20020,1311885942386 to dead servers, submitted shutdown 
> handler to be executed, root=true, meta=true
> 2011-07-28 22:25:10,085 INFO 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting 
> completed in 316329 ms for 
> hdfs://158.1.101.82:9000/hbase/.logs/158-1-101-82,20020,1311885942386
> 2011-07-28 22:26:54,790 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
> running balancer because processing dead regionserver(s): 
> [158-1-101-82,20020,1311885942386]
> 2011-07-28 22:27:11,176 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
> Failed scan of catalog table
> java.net.SocketTimeoutException: Call to 158-1-101-82/158.1.101.82:20020 
> failed on socket timeout exception: java.net.SocketTimeoutException: 60000 
> millis timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/158.1.101.82:57428 
> remote=158-1-101-82/158.1.101.82:20020]
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>         at $Proxy6.getRegionInfo(Unknown Source)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:273)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:333)
>         at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:366)
>         at 
> org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255)
>         at 
> org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:237)
>         at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:139)
>         at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:88)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while 
> waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/158.1.101.82:57428 
> remote=158-1-101-82/158.1.101.82:20020]
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
>         at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.FilterInputStream.read(FilterInputStream.java:116)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>         at java.io.DataInputStream.readInt(DataInputStream.java:370)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
> 2011-07-28 22:28:30,577 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
> Server REPORT rejected; currently processing 158-1-101-82,20020,1311885942386 
> as dead server
> 2011-07-28 22:28:37,591 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:20000-0x23171e103d10018 Creating (or updating) unassigned node for 
> 1028785192 with OFFLINE state
> 2011-07-28 22:28:37,704 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
> was found (or we are ignoring an existing plan) for .META.,,1.1028785192 so 
> generated a random one; hri=.META.,,1.1028785192, src=, 
> dest=158-1-101-202,20020,1311878322145; 2 (online=2, exclude=null) available 
> servers
> 2011-07-28 22:28:37,704 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> .META.,,1.1028785192 to 158-1-101-202,20020,1311878322145
> 2011-07-28 22:28:37,733 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=M_ZK_REGION_OFFLINE, server=158-1-101-82:20000, 
> region=1028785192/.META.
> 2011-07-28 22:28:37,766 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=158-1-101-202,20020,1311878322145, 
> region=1028785192/.META.
> 
> 
> 
> Region server logs:
> 2011-07-28 22:19:17,389 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested 
> for ufdr7,,1311890810267.c1b556627c511076bafbb1b802589cb6. because 
> regionserver20020.cacheFlusher; priority=4, compaction queue size=1
> 
> // blocked for a long time.
> 2011-07-28 22:28:24,829 INFO org.apache.zookeeper.ClientCnxn: Client session 
> timed out, have not heard from server in 552455ms for sessionid 
> 0x13171e103d7003e, closing socket connection and attempting reconnect
> 2011-07-28 22:28:24,829 INFO org.apache.zookeeper.ClientCnxn: Client session 
> timed out, have not heard from server in 552455ms for sessionid 
> 0x23171e103d10020, closing socket connection and attempting reconnect
> 2011-07-28 22:28:25,186 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server /158.1.101.222:2181
> 2011-07-28 22:28:25,838 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server /158.1.101.82:2181
> 

Reply via email to