[
https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938539#comment-13938539
]
Mark Miller commented on SOLR-5872:
-----------------------------------
bq. With the overseer queues, each state update is 4+ zookeeper writes
Given the numbers I've seen published for ZK performance, it seems like that
should not be a big deal in typical cases?
bq. Empirically, we have definitely seen the workqueue back up with lots of
items during a node bounce
I'm not surprised - most of this code has not been optimized or investigated
thoroughly. The original author of a lot of the Overseer code has moved on and
it likely has not seen as much attention as would be nice over the past year.
Until someone looks into the current issues closely though, it seems hard to
recommend rewriting this whole very important piece.
bq. If batching really is so important, there's no batching for external
collection state updates.
I'm not really fully up on "external collections" but AFAIK it's part of some
other work to support tons of collections that I'm not fully sold on yet either
:)
bq. In a "normal" rolling bounce where instances are restarted one-by-one, in
the same order each time, the Overseer is killed at each instance restart, thus
hindering the recovery process by gating state transition.
This points out another issue that we might be able to address.
Without having looked closely at the issues brought up (and I don't see
evidence anyone else has either), it's hard to draw the conclusion the whole
thing just has to be replaced yet.
A couple issues around the old implementation:
* With every node updating the whole cluster state on state change, the
clusterstate.json file is read far too much. The workaround you guys are
proposing for that appears to be only having clients update the clusterstate
when they run into an error - but I'm not sold that that is the best
architecture for the future either. That's a complicated change to make, with
many ramifications for future development.
* Some things that are in the clusterstate now and that could be in the future
are not so easily handled with the non overseer strategy - like marking who is
the leader. You have to have the Overseer running its own special thread to
inject and remove information.
* As things are, on something like cluster startup, there will be tons of reads
and writes of the clusterstate.json - a flood of attempts and retries to update
it in ZooKeeper.
For further discussion around the change, there should be background if you
search the archives.
There is a strong argument to be made that we should first investigate the
performance issues with the current strategy. ZooKeeper is pretty fast - these
state updates are tiny and batched. It seems like we should be able to do a lot
better without throwing out code that has been getting hardened for a long time
now.
> Eliminate overseer queue
> -------------------------
>
> Key: SOLR-5872
> URL: https://issues.apache.org/jira/browse/SOLR-5872
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Reporter: Noble Paul
> Assignee: Noble Paul
>
> The overseer queue is one of the busiest points in the entire system. The
> raison d'ĂȘtre of the queue is
> * Provide batching of operations for the main clusterstate,json so that
> state updates are minimized
> * Avoid race conditions and ensure order
> Now , as we move the individual collection states out of the main
> clusterstate.json, the batching is not useful anymore.
> Race conditions can easily be solved by using a compare and set in Zookeeper.
> The proposed solution is , whenever an operation is required to be performed
> on the clusterstate, the same thread (and of course the same JVM)
> # read the fresh state and version of zk node
> # construct the new state
> # perform a compare and set
> # if compare and set fails go to step 1
> This should be limited to all operations performed on external collections
> because batching would be required for others
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]