[ https://issues.apache.org/jira/browse/SOLR-17926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022573#comment-18022573 ]
Chris M. Hostetter commented on SOLR-17926: ------------------------------------------- I definitely prefer the approach in the PR over the use of "NOW" in the patch. "NOW" really makes sense for ensuring that date rounding/arithmetic of values _in the documents_ are treated consistently regardless of replica or query stage _because we *expect* clock drift_ between the replicas - I don't think it makes sense to try and use it to do "how much timeAllowed do i have left?" type calculations (on the order of 10s of milliseconds) in replicas that didn't generate the "NOW" value in the first place. (We also document "NOW" in the ref-guide as a way for clients to request to specify the frame of refrence they have to requests that include date math – so anyone doing that would get all sorts of really wonky timeAllowed results if we go that route) ---- Two things about the PR that i'm confused by: # I'm not really sure though that I understand the utility of {{adjustShardRequestLimit}} adding {{USED_PARAM}} on sub-requests, instead of just decrementing the value of {{timeAllowed}} on the sub-requests (like the current grouping code does) ? # I don't really understand the point of INFLIGHT_PARAM ? Nothing in the code sets it, which I guess is fine? – it looks like it's intended to just be a way for external clients to override the implicit assumption that "2ms" isn't enough (remaining) time to bother sending sub-requests – but the only code path where {{req.getParams().getLong(INFLIGHT_PARAM, DEFAULT_INFLIGHT_MS)}} is called is a conditional block in the constructor where we already know "{{{}// this is a sub-request{}}}" .. which means {{adjustShardRequestLimit}} (which is only ever going to get called in the original parent request) will only ever use the DEFAULT_INFLIGHT_MS of 2ms ... right? > Discount timeAllowed for all types of queries > --------------------------------------------- > > Key: SOLR-17926 > URL: https://issues.apache.org/jira/browse/SOLR-17926 > Project: Solr > Issue Type: Improvement > Affects Versions: 9.9 > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Priority: Major > Labels: pull-request-available > Attachments: SOLR-17926-using-NOW.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Spin-off from SOLR-17869. > Currently only {{TopGroupsShardRequestFactory}} subtracts the time already > spent on local request processing from {{timeAllowed}} before sending shard > requests. > This is inconsistent and likely not optimal. Since {{timeAllowed}} tracks > wall-clock time it makes sense to track the same starting point for all > phases of distributed request processing and terminate processing early when > the allowed time runs out, as compared to the original starting point. > This is not the way it works now, though (except for this special case of > grouping queries): the same time span is allocated to the query coordinator > and to the shard requests where the processing starts later, which means that > the coordinator may time out while waiting for responses even if all shard > requests succeeded. > [~dsmiley] suggested to use {{SolrRequestInfo.getNOW()}} instead, as the > absolute starting point for both local and distributed requests, and compare > {{timeAllowed}} to that starting point. However, this relies on correct time > sync between nodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org