rzo1 commented on issue #1597: URL: https://github.com/apache/stormcrawler/issues/1597#issuecomment-3941532404
> This is a question of how many malformed URI cases we will handle/normalize in BasicURLNormalizer correct? Bascially yes. It is about stuff like 1. `testBothAnchorAndQueryFilter`, `testQuerySort`, `testPipeInUrlAndFilterStillWorks` -> `http://google.com?a=c|d&foo=baz&foo=bar&test=true&z=2&d=4` -> | is illegal (not properly encoded). Think the first two can be replaced with a URL which doesn't have a pipe and the last one can be treated as failure imho. 2. `testProperURLEncodingWithBackSlash` ``` String urlWithEscapedCharacters = "http://www.voltaix.com/\\SDS\\Silicon\\Trisilane\\Trisilane_SI050_USENG.pdf"; String expectedResult = "http://www.voltaix.com/%5CSDS%5CSilicon%5CTrisilane%5CTrisilane_SI050_USENG.pdf"; ``` I would say, that this normalization would be valuable. 3. `testNonStandardPercentEncoding` -> very specific case. Might be ok to fail now. Dunnow (or add a specific workaround) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
