[ https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866175#comment-17866175 ]
ASF subversion and git services commented on SOLR-17160: -------------------------------------------------------- Commit d3b4c2e1ae39b8ecc5428798531f8b7cf723d787 in solr's branch refs/heads/main from Pierre Salagnac [ https://gitbox.apache.org/repos/asf?p=solr.git;h=d3b4c2e1ae3 ] SOLR-17160: Core admin async ID status, 10k limit and time expire (#2304) Core Admin "async" request status tracking is no longer capped at 100; it's 10k. Statuses are now removed 5 minutes after the read of a completed/failed status. Helps collection async backup/restore and other operations scale to 100+ shards. Co-authored-by: David Smiley <dsmi...@salesforce.com> > Bulk admin operations may fail because of max tracked requests > -------------------------------------------------------------- > > Key: SOLR-17160 > URL: https://issues.apache.org/jira/browse/SOLR-17160 > Project: Solr > Issue Type: Bug > Components: Backup/Restore > Affects Versions: 8.11, 9.5 > Reporter: Pierre Salagnac > Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight > requests and completed/failed request. > _Note they are core/replica level async requests, and not top level requests > which mostly at the collection level. Top level requests are tracked by > storing the async ID in a Zookeeper node, which is not related to this > ticket._ > > For completed/failed requests, we only track a maximum of 100 requests by > dropping the oldest ones. The typical client in > {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls > status of the submitted requests, with a retry loop until requests are > completed. If for some reason we have more than 100 requests that complete or > fail on a node before all statuses are polled by the client, the statuses are > lost and the client will fail with an unexpected error similar to: > {{Invalid status request for requestId: '{_}<id>{_}' - 'notfound'. Retried > _<n>_ times}} > > Instead of having a hard limit for the number of requests we track, we could > have time based eviction. I think it makes sense to keep status of a request > until a given timeout, and then drop it ignoring how many requests we > currently track. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org