Serco Business

Mike,

Thanks for the suggestion. We are running Zookeeper 3.4.5 (probably 
unsurprisingly).

That does sound like the issue we are seeing. I might have misinterpreted what 
I read about zookeeper because I was under the impression that the session ids 
were a function of the number of leader election cycles as opposed to absolute 
time.

Is upgrading zookeeper essentially the only way to work through this issue?

Thanks,
Jonathan

From: Michael Wall <mjw...@apache.org>
Sent: Friday, April 22, 2022 4:19 PM
To: Accumulo User List <user@accumulo.apache.org>
Subject: [EXTERNAL] Re: Tablet Server Session Id Out of Range

Attention: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.
Hi Johnathan

What version of zookeeper are you running?  Sounds like you may be hitting 
https://issues.apache.org/jira/plugins/servlet/mobile#issue/ZOOKEEPER-1622<https://secure-web.cisco.com/1eDbteF3sXIcPFBHbJqVXn7LDYQZz5bnqoB8MMFFQTj93sIwt4VT3sdrlzWMzTQt0T8yLjDahlHV_yWVvG2LIbS8O5uaDdm7hNFzYnd0SbuyaEPKAxq5UGlGrsxN7tKapAe3-L8D-3j9zRrTpPNB7dj3hu2fb6jqEqaZqM--1H4P-8O1NT1Xw-G6ZnQjx764YPQK_FXXuyZSg-2Qp5EjuDMS_sXQVheD8fWezk2u4xhHgppN6ScC3nZGZzFbqg1vyyC-fH0PdciwOP1XAWLqhIDbG8QLsNtDJsEyaYLSC4RhqbcF8-Nrg77o12mnTtCU9/https%3A%2F%2Fissues.apache.org%2Fjira%2Fplugins%2Fservlet%2Fmobile%23issue%2FZOOKEEPER-1622>.
  If so, try shutting down Accumulo, then zookeeper.  Then upgrade zookeeper 
and restart.

Mike


On Fri, Apr 22, 2022, 15:30 Wonders, Jonathan (Serco NA) 
<jonathan.wond...@serco-na.com<mailto:jonathan.wond...@serco-na.com>> wrote:

Serco Business

Greetings,

The team I work with is encountering an issue when starting an Accumulo 1.7.x 
cluster and when running troubleshooting commands such as bin/accumulo admin 
checkTablets. The primary symptom is a NumberFormatException thrown within 
ZookeeperLockChecker that occurs when parsing the tablet server session id 
(Long.parseLong) for an input string "ff804d767efe0004" (which is out of range 
when interpreting as a positive signed long).

>From what I can gather, our zookeeper cluster has been running for such a long 
>time that the epoch component of the session id has grown to the point where 
>interpreting the session id as a signed long would be a negative value. Within 
>the ZooKeeper code, the session id is treated as an unsigned long (e.g., 
>Long.toHexString) which leads me to think that the Accumulo code is not 
>parsing the value correctly. This discrepancy is present in all versions since 
>the introduction of the ZookeeperLockChecker class.

There does not appear to be an easy way to work around this problem. Currently, 
our best idea of how to recover the data from this cluster is to set up a 
separate zookeeper cluster, migrate the data we have in zookeeper to the new 
cluster, and then swap over configuration to point to the new zookeeper 
cluster. I would appreciate any ideas or suggestions from the community.

Thanks,
Jonathan




Reply via email to