[jira] [Comment Edited] (SOLR-5872) Eliminate overseer queue

Jessica Cheng (JIRA) Mon, 17 Mar 2014 18:47:26 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938670#comment-13938670
 ]


Jessica Cheng edited comment on SOLR-5872 at 3/18/14 1:44 AM:
--------------------------------------------------------------

{quote}For further discussion around the change, there should be background if 
you search the archives.{quote}
If you wouldn't mind terribly, will you please paste the link of a few relevant 
threads in the archive? (Sorry, I'm not familiar with all the keywords and 
archives, etc., yet.)

{quote}There is a strong argument to be made that we should first investigate 
the performance issues with the current strategy. ZooKeeper is pretty fast - 
these state updates are tiny and batched. It seems like we should be able to do 
a lot better without throwing out code that has been getting hardened for a 
long time now.{quote}
I see where your hesitation is now, and I can definitely agree. Sounds like 
there are a few points to be investigated for the current system before we 
attempt to change anything:

- Why is the Overseer's so slow at updating cluster state/ What's causing the 
build-up of queue messages during a restart?
- What can we do to generally solve the problem of the Overseer being killed on 
every instance restart in a rolling bounce?
- How much is actually batched?

My gut is that for external collections, batching won't be of that much benefit 
(except for that super-large collection case that Yoink mentioned), but I agree 
that if the current system can be hardened to work even for those, then the 
simplicity of one code path should be preferred over ultra-optimizing for a 
non-issue (assuming the first two points above can be "fixed").


was (Author: mewmewball):
<quote>For further discussion around the change, there should be background if 
you search the archives.</quote>
If you wouldn't mind terribly, will you please paste the link of a few relevant 
threads in the archive? (Sorry, I'm not familiar with all the keywords and 
archives, etc., yet.)

<quote>There is a strong argument to be made that we should first investigate 
the performance issues with the current strategy. ZooKeeper is pretty fast - 
these state updates are tiny and batched. It seems like we should be able to do 
a lot better without throwing out code that has been getting hardened for a 
long time now.</quote>
I see where your hesitation is now, and I can definitely agree. Sounds like 
there are a few points to be investigated for the current system before we 
attempt to change anything:

- Why is the Overseer's so slow at updating cluster state/ What's causing the 
build-up of queue messages during a restart?
- What can we do to generally solve the problem of the Overseer being killed on 
every instance restart in a rolling bounce?
- How much is actually batched?

My gut is that for external collections, batching won't be of that much benefit 
(except for that super-large collection case that Yoink mentioned), but I agree 
that if the current system can be hardened to work even for those, then the 
simplicity of one code path should be preferred over ultra-optimizing for a 
non-issue (assuming the first two points above can be "fixed").

> Eliminate overseer queue 
> -------------------------
>
>                 Key: SOLR-5872
>                 URL: https://issues.apache.org/jira/browse/SOLR-5872
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>
> The overseer queue is one of the busiest points in the entire system. The 
> raison d'être of the queue is
>  * Provide batching of operations for the main clusterstate,json so that 
> state updates are minimized 
> * Avoid race conditions and ensure order
> Now , as we move the individual collection states out of the main 
> clusterstate.json, the batching is not useful anymore.
> Race conditions can easily be solved by using a compare and set in Zookeeper. 
> The proposed solution  is , whenever an operation is required to be performed 
> on the clusterstate, the same thread (and of course the same JVM)
>  # read the fresh state and version of zk node  
>  # construct the new state 
>  # perform a compare and set
>  # if compare and set fails go to step 1
> This should be limited to all operations performed on external collections 
> because batching would be required for others 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-5872) Eliminate overseer queue

Reply via email to