[jira] [Comment Edited] (SOLR-16414) Race condition in PRS state updates

Patson Luk (Jira) Thu, 03 Nov 2022 16:36:08 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628563#comment-17628563
 ]


Patson Luk edited comment on SOLR-16414 at 11/3/22 11:35 PM:
-------------------------------------------------------------

Dug a little bit deeper into the issue. The core issues appear to be:
`parallelStream` has rather weird handling, it does NOT interrupt or return at 
once if any of the `Consumer` supplied throws unhandled exception. Based on 
testing, it will keep on executing all the consumers, and once everyone is 
done, throws the unhandled exception again. Such behavior is shown in this 
[gist|https://gist.github.com/patsonluk/04365cd1140e39d5d59c267782265ece]

Now in our cases, the shutdown flow appears to be:
1. it iterates (parallel, but it should have certain bound, my local test it 
goes as batch of 20, but probably machine/java implementation specific) each 
collection and attempts to [fetch 
PerReplicaStates|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L2962]
2. It eventually invokes SolrZkClient#getChildren which  calls 
`zkCmdExecutor.retryOperation(() -> keeper.getChildren(path, 
wrapWatcher(watcher), stat));`
3. The `retryOperation` first tries to call `keeper.getChildren(path, 
wrapWatcher(watcher), stat)`, which this case might throw 
`KeeperException.ConnectionLossException`. This is because during that time the 
`zkclient` is also getting closed, and after a chain of events, will eventually 
set the `state` of `ClientCnxn$SendThread` to `CLOSED` (might also get 
`SessionTimeoutException`), which creates and throws 
`KeeperException.ConnectionLossException` 
4. But instead of returning right the way, the `ZkCmdExecutor#retryOperation` 
sleeps for 1.5 secs, then checks `ZkCmdExecutor#isClosed`, which is `true`. 
This time it throws `AlreadyClosedException` which is a `RuntimeException`, 
which bubbles all the way up unhandled to the parallelStream.foreach
5. Unfortunately, parallelStream.foreach does NOT interrupt even if there's 
unhandled exception. It will simply just keep going with all the collections, 
and each batch (20?) of collections can get stuck for 1.5 secs. So in an 
environment with alot of collections, the shutdown could be slow? 


Replacing `parallelStream` with `stream` will interrupt the `forEach` 
immediately if any unhandled runtime exception is thrown within, hence avoiding 
this problem. However, if we do want some form of parallelism, perhaps we 
should use more traditional threading/forkjoin task construct instead of 
`parallelStream` ? :D

[~noble.paul] did u observe the slowness on environment with alot of 
collections? Im hoping that my finding could explain what you observed.


was (Author: JIRAUSER291114):
Dug a little bit deeper into the issue. The core issues appear to be:
`parallelStream` has rather weird handling, it does NOT interrupt or return at 
once if any of the `Consumer` supplied throws unhandled exception. Based on 
testing, it will keep on executing all the consumers, and once everyone is 
done, throws the unhandled exception again. Such behavior is shown in this 
[gist|https://gist.github.com/patsonluk/04365cd1140e39d5d59c267782265ece]

Now in our cases, the shutdown flow appears to be:
1. it iterates (parallel, but it should have certain bound, my local test it 
goes as batch of 20, but probably machine/java implementation specific) each 
collection and attempts to [fetch 
PerReplicaStates|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L2962]
2. It eventually invokes SolrZkClient#getChildren which  calls 
`zkCmdExecutor.retryOperation(() -> keeper.getChildren(path, 
wrapWatcher(watcher), stat));`
3. The `retryOperation` first tries to call `keeper.getChildren(path, 
wrapWatcher(watcher), stat)`, which this case might throw 
`KeeperException.ConnectionLossException`. This is because during that time the 
`zkclient` is also getting closed, and after a chain of events, will eventually 
set the `state` of `ClientCnxn$SendThread` to `CLOSED` (might also get 
`SessionTimeoutException`), which creates and throws 
`KeeperException.ConnectionLossException` 
4. But instead of returning right the way, the `ZkCmdExecutor#retryOperation` 
sleeps for 1.5 secs, then checks `ZkCmdExecutor#isClosed`, which is `true`. 
This time it throws `AlreadyClosedException` which is a `RuntimeException`, 
which bubbles all the way up unhandled to the parallelStream.foreach
5. Unfortunately, parallelStream.foreach does NOT interrupt even if there's 
unhandled exception. It will simply just keep going with all the collections, 
and each batch (20?) of collections can get stuck for 1.5 secs. So in an 
environment with alot of collections, the shutdown could be slow? 


Replacing `parallelStream` with `stream` will interrupt the `forEach` 
immediately if any unhandled runtime exception is thrown within, hence avoiding 
this problem. However, if we do want some form of parallelism, perhaps we 
should use more traditional threading/forkjoin task construct instead of 
`parallelStream` ? :D

> Race condition in PRS state updates
> -----------------------------------
>
>                 Key: SOLR-16414
>                 URL: https://issues.apache.org/jira/browse/SOLR-16414
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>            Priority: Major
>             Fix For: 9.1
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> For PRS collections the individual states are potentially updated from 
> individual nodes and sometimes from overseer too. it's possible that
>  
>  # OP1 is sent to overseer at T1
>  # OP2 is executed in the node itself at T2
>  
> Because we cannot guarantee that the OP1 sent to overseer may execute before 
> OP2 tyhe final state will be the result of OP1 which is incorrect and can 
> lead to errors .
> The solution is to never do any PRS writes from overseer. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-16414) Race condition in PRS state updates

Reply via email to