Never mind, I created a JIRA account and filed the tickets.

https://issues.apache.org/jira/browse/ZOOKEEPER-4904
https://issues.apache.org/jira/browse/SOLR-17694

On Fri, Mar 7, 2025 at 2:02 PM Patrick Lok <patrick....@salesforce.com>
wrote:

> Hi David and Ilan
>
> I don't have a JIRA account, could one of you please create the tickets?
> I've written the tickets below, please make any changes as you see fit.
>
> =================================
> ZooKeeper
>
> Summary:
> SessionTrackerImpl generates negative session IDs when server ID is larger
> than 127
>
> Description:
> This issue was discovered during a [discussion|
> https://lists.apache.org/thread/0pwxw1rzdffmbxctdzv2rmplzgwt6lpl] about
> negative Solr Overseer IDs
>
> Setting the server ID to a value greater than 127 in the myid file causes
> the SessionTrackerImpl.initializeNextSessionId function to generate a
> negative session ID.
>
>
> {code:java}
>     public static long initializeNextSessionId(long id) {
>         long nextSid;
>         nextSid = (Time.currentElapsedTime() << 24) >>> 8;
>         nextSid = nextSid | (id << 56);  <------------------------
>         if (nextSid == EphemeralType.CONTAINER_EPHEMERAL_OWNER) {
>             ++nextSid;  // this is an unlikely edge case, but check it
> just in case
>         }
>         return nextSid;
>     }
> {code}
>
> =================================
>
> Solr
>
> Summary:
> LeaderElector not able to parse node ID correctly when it has a leading
> dash
>
> Description:
> This issue was [reported|
> https://lists.apache.org/thread/0pwxw1rzdffmbxctdzv2rmplzgwt6lpl] on
> users@solr.apache.org.
>
> There could be time when the node ID contains a leading dash
> {noformat}
> -5188057493699159958-1.1.1.15:8983_solr-n_0000192189
> {noformat}
> instead of just
> {noformat}
> 5188057493699159958-1.1.1.15:8983_solr-n_0000192189
> {noformat}
> In such case, LeaderElector.getNodeName returns
> *5188057493699159958-1.1.1.15:8983_solr* instead of just {*}1.1.1.15:8983
> _solr{*}.
>
> The problem is that the regex LeaderElector.NODE_NAME was not designed to
> handle the leading dash. LeaderElector.LEADER_SEQ and
> LeaderElector.SESSION_ID seem to have the same problem.
>
> Thanks,
> Patrick
>
>
>
> On Sat, Dec 21, 2024 at 12:02 PM David Smiley <dsmi...@apache.org> wrote:
>
>> Patrick (or Ilan), can you please file a JIRA issue to describe the
>> problem.
>> Ideally also mention the work-around and possible solution ideas.
>>
>> On Wed, Dec 18, 2024 at 12:27 PM Patrick Lok
>> <patrick....@salesforce.com.invalid> wrote:
>>
>> > Hi Ilan, thank you so much for the pointers! That's exactly the
>> problem. We
>> > updated our system to wrap the ZK server ID at 127, instead of 256, and
>> > that fixed the problem.
>> >
>> > Again, thank you so much!
>> >
>> > Regards,
>> > Patrick
>> >
>> >
>> > On Thu, Dec 5, 2024 at 4:47 PM Ilan Ginzburg <ilans...@gmail.com>
>> wrote:
>> >
>> > > That value seems to be the ZooKeeper session id.
>> > > I've found
>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ZOOKEEPER-1622__;!!DCbAVzZNrAf4!H2u9K4JvYG1dubi-ktOCh1jsLacpumxkWUOmh_lUjp5yIBlLf_NtARU_H6WSEgGiRk7ddcpi_uMkt6UYbJhr5w$
>> that
>> > > might be related (but I guess you would have seen the error a while
>> > > ago so it's likely not that).
>> > >
>> > > Also, looking at the session ID generation code (see code in the jira
>> > > above, SessionTrackerImpl.initializeNextSessionId()), if the server id
>> > > is bigger than 127 the resulting session id will be negative (if my
>> > > bit shift analysis skills are still ok).
>> > > Anything that might have changed there?
>> > >
>> > > Doesn't seem to be something that can be reset, it is decided by this
>> > > method. Solr code should be fixed to do a better job of parsing that
>> > > string.
>> > >
>> > > Ilan
>> > >
>> > >
>> > > On Tue, Dec 3, 2024 at 6:56 PM Patrick Lok
>> > > <patrick....@salesforce.com.invalid> wrote:
>> > > >
>> > > > That's what I think is happening too. The problem is the code is not
>> > > > expecting it to happen and not handling it correctly. I'm wondering
>> if
>> > > > there's a way to reset it.
>> > > >
>> > > > On Tue, Dec 3, 2024 at 3:28 AM Ilan Ginzburg <ilans...@gmail.com>
>> > wrote:
>> > > >
>> > > > > Didn’t look at the code but from the number of digits wouldn’t it
>> be
>> > a
>> > > long
>> > > > > wrapping around into negative territory?
>> > > > >
>> > > > > On Tue 3 Dec 2024 at 02:55, Patrick Lok <
>> patrick....@salesforce.com
>> > > > > .invalid>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > We are seeing some weird issues with the Overseer ID which
>> causes
>> > > some
>> > > > > > overseer election problems in our cluster.
>> > > > > >
>> > > > > > Recently we have noticed that one of our Solr 8 clusters is
>> having
>> > > > > trouble
>> > > > > > electing dedicated overseer hosts as leader. After some
>> > > investigation, we
>> > > > > > noticed that we are having "negative" Overseer ID (Overseer ID
>> with
>> > > > > leading
>> > > > > > dash"
>> > > > > >
>> > > > > > [zk: localhost:2181(CONNECTED) 0] ls /overseer_elect/election
>> > > > > > [-5188057493699159958-1.1.1.15:8983_solr-n_0000192189,
>> > > > > > -5260098076001480373-
>> > > > > > 1.1.1.19:8983_solr-n_0000192192,
>> > > > > > -5548288611309897871-1.1.1.28:8983_solr-n_0000192191,
>> > > > > > -6124715353171356222-1.1.1.18:8983_solr-n_0000192188,
>> > > > > -6412935227404643144-
>> > > > > > 1.1.1.22:8983_solr-n_0000192186,
>> > > > > > -6412935227404648050-1.1.1.89:8983_solr-n_0000192181,
>> > > > > > -6557083032988176767-1.1.1.105:8983_solr-n_0000192190,
>> > > > > > -6701159159471144532-
>> > > > > > 1.1.1.219:8983_solr-n_0000192183]
>> > > > > >
>> > > > > >
>> > > > > > (the actual IP addresses are different from what pasted above)
>> > > > > >
>> > > > > > Because of the leading dash in the Overseer ID, it causes the
>> > > > > > LeaderElector.getNodeName() to return
>> "5188057493699159958-1.1.1.15
>> > > > > > :8983_solr" instead "1.1.1.15:8983_solr" causing quite a bit of
>> > > issues.
>> > > > > >
>> > > > > > Does anyone know why we started seeing a leading dash with the
>> > > initial
>> > > > > set
>> > > > > > of digits in the Overseer ID? Who's generating that set of
>> digits?
>> > > Solr
>> > > > > or
>> > > > > > ZooKeeper? Is there a way to fix it?
>> > > > > >
>> > > > > > A simple change to LeaderElector.NODE_NAME seems to be an easy
>> fix.
>> > > But
>> > > > > > since there's no unit test around it, I'm a bit worried that it
>> > might
>> > > > > break
>> > > > > > somewhere else in the code.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Patrick
>> > > > > >
>> > > > >
>> > >
>> >
>>
>

Reply via email to