> Wouldn't it make more sense to set this value to $zkClientTimeout x
$numServers?

I don't think it would make much sense to calculate it in such a method.

The connection speed to the Zookeeper does not scale linearly with respect
to number of server(at least not for reasonable numbers).

But I do agree that it should be configurability, interestingly enough,
there was a ticket on this in Jira (
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-7561) from
*6* years ago.

In there they wrote:
> Note that unlike other configuration stored in ZK, this probably needs to
be bootstrapped via a System property.

I'm not completely sure why they wrote it, if anyone knows please let me
know :), apart from this detail it should be pretty straight forward to
implement this

On Mon, Jul 12, 2021, 5:30 PM Bram Van Dam <bram.van...@intix.eu> wrote:

> Howdy,
>
> Not sure whether to send this to dev@ or user@, so I'll try user@ first.
>
> we've had a couple of instances of Solr not starting because a ZK
> conncetion couldn't be made in time. "Could not connect to ZooKeeper
> within 30000ms".
>
> While debugging this, I noticed that there are two timeouts.
> zkClientTimeout and zkClientConnectTimeout.
>
> zkClientTimeout is passed to ZK and is used by ZK itself. This is fine
> and is configurable.
>
> zkClientConnectTimeout is used by Solr when creating a ZK connection: if
> no connection can be made within zkClientConnectTimeout, Solr considers
> ZK to be dead.
>
> Where things get fishy is that zkClientConnectTimeout is hard coded in
> ZkContainer.java. It is set to 30 seconds, *unless* you're running
> *embedded* ZK with multiple ZKs -- then it is set to 24hours.
>
> This basically means that if you're using an external ensemble, you're
> screwed if the first couple of connection attempts fail.
>
> Wouldn't it make more sense to set this value to $zkClientTimeout x
> $numServers? Or to make it configurable outright?
>
> Thanks,
>
>  - Bram
>

Reply via email to