[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

Gus Heck (Jira) Wed, 28 Feb 2024 11:09:11 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821831#comment-17821831
 ]


Gus Heck commented on SOLR-17160:
---------------------------------

Interfering with some requests is perhaps less problematic than running the 
server out of memory IMHO YMMV. Also, the client sees failure, but the original 
request still gets executed right? So client code that *really* cares can trap 
this case and re-verify heuristically, or re-send if idempotent...

> Bulk admin operations may fail because of max tracked requests
> --------------------------------------------------------------
>
>                 Key: SOLR-17160
>                 URL: https://issues.apache.org/jira/browse/SOLR-17160
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore
>    Affects Versions: 8.11, 9.5
>            Reporter: Pierre Salagnac
>            Priority: Minor
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight 
> requests and completed/failed request.
> _Note they are core/replica level async requests, and not top level requests 
> which mostly at the collection level. Top level requests are tracked by 
> storing the async ID in a Zookeeper node, which is not related to this 
> ticket._
>  
> For completed/failed requests, we only track a maximum of 100 requests by 
> dropping the oldest ones. The typical client in 
> {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls 
> status of the submitted requests, with a retry loop until requests are 
> completed. If for some reason we have more than 100 requests that complete or 
> fail on a node before all statuses are polled by the client, the statuses are 
> lost and the client will fail with an unexpected error similar to:
> {{Invalid status request for requestId: '{_}<id>{_}' - 'notfound'. Retried 
> _<n>_ times}}
>  
> Instead of having a hard limit for the number of requests we track, we could 
> have time based eviction. I think it makes sense to keep status of a request 
> until a given timeout, and then drop it ignoring how many requests we 
> currently track.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17160) Bulk admin operations may fail because of max tracked requests

Reply via email to