[
https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938670#comment-13938670
]
Jessica Cheng edited comment on SOLR-5872 at 3/18/14 1:44 AM:
--------------------------------------------------------------
{quote}For further discussion around the change, there should be background if
you search the archives.{quote}
If you wouldn't mind terribly, will you please paste the link of a few relevant
threads in the archive? (Sorry, I'm not familiar with all the keywords and
archives, etc., yet.)
{quote}There is a strong argument to be made that we should first investigate
the performance issues with the current strategy. ZooKeeper is pretty fast -
these state updates are tiny and batched. It seems like we should be able to do
a lot better without throwing out code that has been getting hardened for a
long time now.{quote}
I see where your hesitation is now, and I can definitely agree. Sounds like
there are a few points to be investigated for the current system before we
attempt to change anything:
- Why is the Overseer's so slow at updating cluster state/ What's causing the
build-up of queue messages during a restart?
- What can we do to generally solve the problem of the Overseer being killed on
every instance restart in a rolling bounce?
- How much is actually batched?
My gut is that for external collections, batching won't be of that much benefit
(except for that super-large collection case that Yoink mentioned), but I agree
that if the current system can be hardened to work even for those, then the
simplicity of one code path should be preferred over ultra-optimizing for a
non-issue (assuming the first two points above can be "fixed").
was (Author: mewmewball):
<quote>For further discussion around the change, there should be background if
you search the archives.</quote>
If you wouldn't mind terribly, will you please paste the link of a few relevant
threads in the archive? (Sorry, I'm not familiar with all the keywords and
archives, etc., yet.)
<quote>There is a strong argument to be made that we should first investigate
the performance issues with the current strategy. ZooKeeper is pretty fast -
these state updates are tiny and batched. It seems like we should be able to do
a lot better without throwing out code that has been getting hardened for a
long time now.</quote>
I see where your hesitation is now, and I can definitely agree. Sounds like
there are a few points to be investigated for the current system before we
attempt to change anything:
- Why is the Overseer's so slow at updating cluster state/ What's causing the
build-up of queue messages during a restart?
- What can we do to generally solve the problem of the Overseer being killed on
every instance restart in a rolling bounce?
- How much is actually batched?
My gut is that for external collections, batching won't be of that much benefit
(except for that super-large collection case that Yoink mentioned), but I agree
that if the current system can be hardened to work even for those, then the
simplicity of one code path should be preferred over ultra-optimizing for a
non-issue (assuming the first two points above can be "fixed").
> Eliminate overseer queue
> -------------------------
>
> Key: SOLR-5872
> URL: https://issues.apache.org/jira/browse/SOLR-5872
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Reporter: Noble Paul
> Assignee: Noble Paul
>
> The overseer queue is one of the busiest points in the entire system. The
> raison d'ĂȘtre of the queue is
> * Provide batching of operations for the main clusterstate,json so that
> state updates are minimized
> * Avoid race conditions and ensure order
> Now , as we move the individual collection states out of the main
> clusterstate.json, the batching is not useful anymore.
> Race conditions can easily be solved by using a compare and set in Zookeeper.
> The proposed solution is , whenever an operation is required to be performed
> on the clusterstate, the same thread (and of course the same JVM)
> # read the fresh state and version of zk node
> # construct the new state
> # perform a compare and set
> # if compare and set fails go to step 1
> This should be limited to all operations performed on external collections
> because batching would be required for others
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]