Re: [PR] #784 - Support Retry-After in FetcherBolt [stormcrawler]

via GitHub Tue, 16 Jun 2026 02:23:03 -0700


rzo1 commented on PR #1944:
URL: https://github.com/apache/stormcrawler/pull/1944#issuecomment-4716944477


   Agreed, the in-queue hold is the fragile part. A long `Retry-After` (and the 
default is uncapped) would park the queue's siblings past 
`topology.message.timeout.secs` and trigger replays.
   
   For this PR, I would take the **purge** route. On a `Retry-After`, re-emit 
the affected queue's pending items rather than holding them, mirroring the 
existing `crawl-delay-too-long` path at `FetcherBolt.java:682` so the frontier 
reschedules them. One thing I'd like your take on: should the re-emitted URLs 
go out as `Status.ERROR` (reuses the existing path, but carries error/retry 
semantics), or should we set an explicit future `nextFetchDate` so the 
scheduler honors the exact back-off?
   
   I'd treat the **host stream / host-aware spout** design (#867, your branch 
990) as the proper long-term home for this, happy to follow up there once we 
settle the short-term behaviour. Does that split sound right to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] #784 - Support Retry-After in FetcherBolt [stormcrawler]

Reply via email to