Hi, Stack.

Very much thank you for your help: we have resolved this issue!

There was a stale *znode* in zookeeper tree that states that node is in
'enabling' state, but last modified several days ago:

[zk: myserver:2181(CONNECTED) 0] ls /hbase/table
[page]
[zk: myserver:2181(CONNECTED) 1] ls /hbase/table/page
[]
[zk: myserver:2181(CONNECTED) 2] get /hbase/table/page
� 11174@myserver*ENABLING*
cZxid = 0x31a4a
ctime = Fri Aug 10 11:23:18 EDT 2012
mZxid = 0x31c4f
mtime = Fri Aug 10 11:24:57 EDT 2012
pZxid = 0x31a4a
cversion = 0
dataVersion = 5
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 40
numChildren = 0

I did following to fix this issue:
1) stopped HBase: master and all region servers;
2) stopped Zookeeper;
3) made backup of Zookeeper data (/var/lib/zookeeper)
4) started Zookeeper;
5) removed znode using Zookeeper CLI:
[zk: hbase01dev.303net.pvt:2181(CONNECTED) 3] delete /hbase/table/page
[zk: hbase01dev.303net.pvt:2181(CONNECTED) 4] ls /hbase/table/page
Node does not exist: /hbase/table/page
6) started HBase: mater and all region servers.

After this everything was fine: the table was showing as 'enabled' and
'disable' worked as well:

hbase(main):001:0> is_enabled 'page'
true

0 row(s) in 0.7120 seconds

hbase(main):002:0> disable 'page'
0 row(s) in 3.1160 seconds

When it was in hung state, the table was actually served by RSes: I could
count rows, do scans, run MR jobs using HBaseStorage pig class, etc. What
was blocked is updates to table schema: alter did not work with table not
in disabled state, but disabled did not work with table not in enabled
state.

All regions of the table were hosted by RSes. Here is excerpt from
underlying HDFS structure:

-rw-r--r--   3 hbase hbase       1307 2012-08-10 11:24
/hbase/page/.tableinfo.0000000004
drwxr-xr-x   - hbase hbase          0 2012-08-10 11:24 /hbase/page/.tmp
drwxr-xr-x   - hbase hbase          0 2012-08-16 18:55
/hbase/page/01084884c5d8b61a5a1e529822563cae
-rw-r--r--   3 hbase hbase        523 2012-08-13 16:39
/hbase/page/01084884c5d8b61a5a1e529822563cae/.regioninfo
drwxr-xr-x   - hbase hbase          0 2012-08-16 19:57
/hbase/page/01084884c5d8b61a5a1e529822563cae/.tmp
drwxr-xr-x   - hbase hbase          0 2012-08-17 03:28
/hbase/page/01084884c5d8b61a5a1e529822563cae/s
-rw-rw-rw-   3 jenkins supergroup     742993 2012-08-17 03:08
/hbase/page/01084884c5d8b61a5a1e529822563cae/s/11adf78853944d02a3e39c1eb0b631a3
-rw-rw-rw-   3 jenkins supergroup     916762 2012-08-17 00:22
/hbase/page/01084884c5d8b61a5a1e529822563cae/s/a0a9c21d470549f9ab6c29d73d26ce8d
-rw-r--r--   3 hbase   hbase         4713301 2012-08-16 18:55
/hbase/page/01084884c5d8b61a5a1e529822563cae/s/cf447b6576ad4cfe898dfee8e77c0e2c
drwxr-xr-x   - hbase   hbase               0 2012-08-17 03:28
/hbase/page/01084884c5d8b61a5a1e529822563cae/t
-rw-rw-rw-   3 jenkins supergroup   27844042 2012-08-17 00:22
/hbase/page/01084884c5d8b61a5a1e529822563cae/t/48a5c5cb10204854a7b76017145dfda7
-rw-r--r--   3 hbase   hbase       697429695 2012-08-16 19:57
/hbase/page/01084884c5d8b61a5a1e529822563cae/t/58b9027f020548e880f0d8c3c636ce18
-rw-rw-rw-   3 jenkins supergroup   15529996 2012-08-17 03:08
/hbase/page/01084884c5d8b61a5a1e529822563cae/t/bdc10a1f6285412caa60d23d745c1180
...

The question I have now is whether I had to stop whole hbase cluster or not?
Is is safe to remove stale *znode* while HBase is operating, if I sure no
compaction / splitting is going?

--
Sincerely yours
Pavel Vozdvizhenskiy
Grid Dynamics / BigData



On Fri, Aug 17, 2012 at 3:47 AM, Stack <[email protected]> wrote:

> On Thu, Aug 16, 2012 at 3:48 PM, Pavel Vozdvizhenskiy
> <[email protected]> wrote:
> > I would appreciate on any help how to fix it.
> >
>
> I've not come across this one before.
>
> If you list whats under /hbase/table?  Does the table show there?  You
> could try removing the znode?  You can look by doing ./bin/hbase zkcli
>
> St.Ack
>

Reply via email to