rzo1 commented on PR #1944:
URL: https://github.com/apache/stormcrawler/pull/1944#issuecomment-4722757632

   Thanks @jnioche and @sebastian-nagel for the review and discussion.
   
   Summary of where we landed: honouring `Retry-After` by holding the internal
   queue inside `FetcherBolt` is workable, but fragile. To make it correct we'd
   also need to make queue reaping back-off aware (don't reap a queue while its
   `nextFetchTime` is in the future, see @sebastian-nagel's note and Nutch's
   `FetchItemQueues`), and even then a long delay risks back-pressure and tuple
   timeouts, and the number of held queues can grow large in a broad crawl.
   
   Given that #784 has been open for a long time without concrete user demand,
   investing in the interim in-bolt workaround doesn't seem worth the added
   complexity. The proper long-term home for this is the host-aware spout / host
   stream design (#867), which avoids the back-pressure problem entirely.
   
   So I'll close this PR in favour of pursuing #867. #784 stays open and we can
   revisit `Retry-After` there as part of the host-aware implementation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to