tballison opened a new pull request, #1477:
URL: https://github.com/apache/incubator-stormcrawler/pull/1477

   I tested this offline with https://www.cdc.gov and https://www.fda.gov.
   
   I confirmed that when `sitemap.discovery=false`, I could set one in the seed 
file to `true`, and the behavior was as expected.
   
   I also tested the opposite, where the default was `true`, but the seed for 
one of them was `false`, and the behavior was as expected.
   
   I'm not sure this is the best solution. I don't like tightly coupling logic 
for an optional filter in the fetcherbolts, but so it goes.
   
   And, as usual, unit tests are, well, hard.
   
   Let me know what you think.
   
   Thank you for contributing to Apache StormCrawler.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a issue associated with this PR? Is it referenced in the 
commit message?
   
   - [ ] Does your PR title start with `#XXXX` where `XXXX` is the issue number 
you are trying to resolve? 
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically main)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Is the code properly formatted with `mvn git-code-format:format-code 
-Dgcf.globPattern="**/*" -Dskip.format.code=false`?
   
   ### For code changes:
   
   - [ ] Have you ensured that the full suite of tests is executed via `mvn 
clean verify`?
   - [ ] Have you written or updated unit tests to verify your changes?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file?
   - [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file?
   
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions for 
build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to