rzo1 commented on PR #1944: URL: https://github.com/apache/stormcrawler/pull/1944#issuecomment-4716944477
Agreed, the in-queue hold is the fragile part. A long `Retry-After` (and the default is uncapped) would park the queue's siblings past `topology.message.timeout.secs` and trigger replays. For this PR, I would take the **purge** route. On a `Retry-After`, re-emit the affected queue's pending items rather than holding them, mirroring the existing `crawl-delay-too-long` path at `FetcherBolt.java:682` so the frontier reschedules them. One thing I'd like your take on: should the re-emitted URLs go out as `Status.ERROR` (reuses the existing path, but carries error/retry semantics), or should we set an explicit future `nextFetchDate` so the scheduler honors the exact back-off? I'd treat the **host stream / host-aware spout** design (#867, your branch 990) as the proper long-term home for this, happy to follow up there once we settle the short-term behaviour. Does that split sound right to you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
