GitHub user JaspreetSinghChahal edited a comment on the discussion: Storm crawler not honouring crawl delay
Subdomains were one of the issue, it was resolved by partition.url.mode: "byDomain". Still issue is not gone, even in same domain lastprocesseddate is less than 15mins for someof these. I have also reduced parallelism still it didnt change anything. GitHub link: https://github.com/apache/stormcrawler/discussions/1808#discussioncomment-15763493 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
