[ https://issues.apache.org/jira/browse/SOLR-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821812#comment-17821812 ]
Pierre Salagnac edited comment on SOLR-17160 at 2/28/24 6:43 PM: ----------------------------------------------------------------- Thanks for raising this concern [~gus]. Currently, this code is already somehow vulnerable to DOS. Even if we don't grow forever in memory, we only track 100 requests. So if we bomb Solr with unwanted admin requests, the legitimate ones will fail from the client point of view because the client won't be able to retrieve the completion statuses? But I agree not having any limited memory footprint at all is not ideal. In addition to time based eviction, I can add a max size for the cache. I was thinking it doing this initially, but I dropped it as it showed to be unnecessary. I thinking of adding a limit much higher than the 100 requests tracked tracked, can be 10K. Most of the time, we should never hit this limit, and in case of a DOS, it should be sufficient to protect the host from an abusive memory utilization here. was (Author: pierre.salagnac): Thanks for raising this concern [~gus]. Currently, this code is already somehow vulnerable to DOS. Even if we don't grow forever in memory, we only track 100 requests. So if we bomb Solr with unwanted admin requests, the legitimate ones with still fail from the client point of view because the client won't be able to the completion statuses? But I agree not having any limited memory footprint at all is not ideal. In addition to time based eviction, I can add a max size for the cache. I was thinking it doing this initially, but I dropped it as it showed to be unnecessary. I thinking of adding a limit much higher than the 100 requests tracked tracked, can be 10K. Most of the time, we should never hit this limit, and in case of a DOS, it should be sufficient to protect the host from an abusive memory utilization here. > Bulk admin operations may fail because of max tracked requests > -------------------------------------------------------------- > > Key: SOLR-17160 > URL: https://issues.apache.org/jira/browse/SOLR-17160 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore > Affects Versions: 8.11, 9.5 > Reporter: Pierre Salagnac > Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > In {{{}CoreAdminHandler{}}}, we maintain in-memory the list of in-flight > requests and completed/failed request. > _Note they are core/replica level async requests, and not top level requests > which mostly at the collection level. Top level requests are tracked by > storing the async ID in a Zookeeper node, which is not related to this > ticket._ > > For completed/failed requests, we only track a maximum of 100 requests by > dropping the oldest ones. The typical client in > {{CollectionHandlingUtils.waitForCoreAdminAsyncCallToComplete()}} polls > status of the submitted requests, with a retry loop until requests are > completed. If for some reason we have more than 100 requests that complete or > fail on a node before all statuses are polled by the client, the statuses are > lost and the client will fail with an unexpected error similar to: > {{Invalid status request for requestId: '{_}<id>{_}' - 'notfound'. Retried > _<n>_ times}} > > Instead of having a hard limit for the number of requests we track, we could > have time based eviction. I think it makes sense to keep status of a request > until a given timeout, and then drop it ignoring how many requests we > currently track. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org