Hi David and Ilan

I don't have a JIRA account, could one of you please create the tickets?
I've written the tickets below, please make any changes as you see fit.

=================================
ZooKeeper

Summary:
SessionTrackerImpl generates negative session IDs when server ID is larger
than 127

Description:
This issue was discovered during a [discussion|
https://lists.apache.org/thread/0pwxw1rzdffmbxctdzv2rmplzgwt6lpl] about
negative Solr Overseer IDs

Setting the server ID to a value greater than 127 in the myid file causes
the SessionTrackerImpl.initializeNextSessionId function to generate a
negative session ID.


{code:java}
    public static long initializeNextSessionId(long id) {
        long nextSid;
        nextSid = (Time.currentElapsedTime() << 24) >>> 8;
        nextSid = nextSid | (id << 56);  <------------------------
        if (nextSid == EphemeralType.CONTAINER_EPHEMERAL_OWNER) {
            ++nextSid;  // this is an unlikely edge case, but check it just
in case
        }
        return nextSid;
    }
{code}

=================================

Solr

Summary:
LeaderElector not able to parse node ID correctly when it has a leading dash

Description:
This issue was [reported|
https://lists.apache.org/thread/0pwxw1rzdffmbxctdzv2rmplzgwt6lpl] on
users@solr.apache.org.

There could be time when the node ID contains a leading dash
{noformat}
-5188057493699159958-1.1.1.15:8983_solr-n_0000192189
{noformat}
instead of just
{noformat}
5188057493699159958-1.1.1.15:8983_solr-n_0000192189
{noformat}
In such case, LeaderElector.getNodeName returns
*5188057493699159958-1.1.1.15:8983_solr* instead of just {*}1.1.1.15:8983
_solr{*}.

The problem is that the regex LeaderElector.NODE_NAME was not designed to
handle the leading dash. LeaderElector.LEADER_SEQ and
LeaderElector.SESSION_ID seem to have the same problem.

Thanks,
Patrick



On Sat, Dec 21, 2024 at 12:02 PM David Smiley <dsmi...@apache.org> wrote:

> Patrick (or Ilan), can you please file a JIRA issue to describe the
> problem.
> Ideally also mention the work-around and possible solution ideas.
>
> On Wed, Dec 18, 2024 at 12:27 PM Patrick Lok
> <patrick....@salesforce.com.invalid> wrote:
>
> > Hi Ilan, thank you so much for the pointers! That's exactly the problem.
> We
> > updated our system to wrap the ZK server ID at 127, instead of 256, and
> > that fixed the problem.
> >
> > Again, thank you so much!
> >
> > Regards,
> > Patrick
> >
> >
> > On Thu, Dec 5, 2024 at 4:47 PM Ilan Ginzburg <ilans...@gmail.com> wrote:
> >
> > > That value seems to be the ZooKeeper session id.
> > > I've found
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ZOOKEEPER-1622__;!!DCbAVzZNrAf4!H2u9K4JvYG1dubi-ktOCh1jsLacpumxkWUOmh_lUjp5yIBlLf_NtARU_H6WSEgGiRk7ddcpi_uMkt6UYbJhr5w$
> that
> > > might be related (but I guess you would have seen the error a while
> > > ago so it's likely not that).
> > >
> > > Also, looking at the session ID generation code (see code in the jira
> > > above, SessionTrackerImpl.initializeNextSessionId()), if the server id
> > > is bigger than 127 the resulting session id will be negative (if my
> > > bit shift analysis skills are still ok).
> > > Anything that might have changed there?
> > >
> > > Doesn't seem to be something that can be reset, it is decided by this
> > > method. Solr code should be fixed to do a better job of parsing that
> > > string.
> > >
> > > Ilan
> > >
> > >
> > > On Tue, Dec 3, 2024 at 6:56 PM Patrick Lok
> > > <patrick....@salesforce.com.invalid> wrote:
> > > >
> > > > That's what I think is happening too. The problem is the code is not
> > > > expecting it to happen and not handling it correctly. I'm wondering
> if
> > > > there's a way to reset it.
> > > >
> > > > On Tue, Dec 3, 2024 at 3:28 AM Ilan Ginzburg <ilans...@gmail.com>
> > wrote:
> > > >
> > > > > Didn’t look at the code but from the number of digits wouldn’t it
> be
> > a
> > > long
> > > > > wrapping around into negative territory?
> > > > >
> > > > > On Tue 3 Dec 2024 at 02:55, Patrick Lok <
> patrick....@salesforce.com
> > > > > .invalid>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We are seeing some weird issues with the Overseer ID which causes
> > > some
> > > > > > overseer election problems in our cluster.
> > > > > >
> > > > > > Recently we have noticed that one of our Solr 8 clusters is
> having
> > > > > trouble
> > > > > > electing dedicated overseer hosts as leader. After some
> > > investigation, we
> > > > > > noticed that we are having "negative" Overseer ID (Overseer ID
> with
> > > > > leading
> > > > > > dash"
> > > > > >
> > > > > > [zk: localhost:2181(CONNECTED) 0] ls /overseer_elect/election
> > > > > > [-5188057493699159958-1.1.1.15:8983_solr-n_0000192189,
> > > > > > -5260098076001480373-
> > > > > > 1.1.1.19:8983_solr-n_0000192192,
> > > > > > -5548288611309897871-1.1.1.28:8983_solr-n_0000192191,
> > > > > > -6124715353171356222-1.1.1.18:8983_solr-n_0000192188,
> > > > > -6412935227404643144-
> > > > > > 1.1.1.22:8983_solr-n_0000192186,
> > > > > > -6412935227404648050-1.1.1.89:8983_solr-n_0000192181,
> > > > > > -6557083032988176767-1.1.1.105:8983_solr-n_0000192190,
> > > > > > -6701159159471144532-
> > > > > > 1.1.1.219:8983_solr-n_0000192183]
> > > > > >
> > > > > >
> > > > > > (the actual IP addresses are different from what pasted above)
> > > > > >
> > > > > > Because of the leading dash in the Overseer ID, it causes the
> > > > > > LeaderElector.getNodeName() to return
> "5188057493699159958-1.1.1.15
> > > > > > :8983_solr" instead "1.1.1.15:8983_solr" causing quite a bit of
> > > issues.
> > > > > >
> > > > > > Does anyone know why we started seeing a leading dash with the
> > > initial
> > > > > set
> > > > > > of digits in the Overseer ID? Who's generating that set of
> digits?
> > > Solr
> > > > > or
> > > > > > ZooKeeper? Is there a way to fix it?
> > > > > >
> > > > > > A simple change to LeaderElector.NODE_NAME seems to be an easy
> fix.
> > > But
> > > > > > since there's no unit test around it, I'm a bit worried that it
> > might
> > > > > break
> > > > > > somewhere else in the code.
> > > > > >
> > > > > > Thanks,
> > > > > > Patrick
> > > > > >
> > > > >
> > >
> >
>

Reply via email to