[ https://issues.apache.org/jira/browse/SOLR-16414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628244#comment-17628244 ]
Jan Høydahl commented on SOLR-16414: ------------------------------------ Do we know that the sequential forEach is safe? It will also cause a burst of events, although slower. Do you know the root cause of the deadlock? We can guess that the unbounded parallellStream would cause too much traffic to ZK at once so that something breaks? But if someone has 100 solr nodes instead of 8, you'd still get a massive parallell load on ZK? Still, if this is a PRS-only issue and the fix today is likely to work on the 8-node 1000 collections test, then we should not hold up 9.1 to try to get a perfect solution for 1% of Solr's users. > Race condition in PRS state updates > ----------------------------------- > > Key: SOLR-16414 > URL: https://issues.apache.org/jira/browse/SOLR-16414 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Noble Paul > Assignee: Noble Paul > Priority: Major > Fix For: 9.1 > > Time Spent: 40m > Remaining Estimate: 0h > > For PRS collections the individual states are potentially updated from > individual nodes and sometimes from overseer too. it's possible that > > # OP1 is sent to overseer at T1 > # OP2 is executed in the node itself at T2 > > Because we cannot guarantee that the OP1 sent to overseer may execute before > OP2 tyhe final state will be the result of OP1 which is incorrect and can > lead to errors . > The solution is to never do any PRS writes from overseer. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org