rzo1 commented on issue #1597:
URL: https://github.com/apache/stormcrawler/issues/1597#issuecomment-3941532404

   > This is a question of how many malformed URI cases we will 
handle/normalize in BasicURLNormalizer correct? 
   
   Bascially yes. It is about stuff like
   
   1. `testBothAnchorAndQueryFilter`, `testQuerySort`, 
`testPipeInUrlAndFilterStillWorks` -> 
`http://google.com?a=c|d&foo=baz&foo=bar&test=true&z=2&d=4` -> | is illegal 
(not properly encoded). Think the first two can be replaced with a URL which 
doesn't have a pipe and the last one can be treated as failure imho.
   
   2. `testProperURLEncodingWithBackSlash`  
   ```
     String urlWithEscapedCharacters =
                   
"http://www.voltaix.com/\\SDS\\Silicon\\Trisilane\\Trisilane_SI050_USENG.pdf";;
           String expectedResult =
                   
"http://www.voltaix.com/%5CSDS%5CSilicon%5CTrisilane%5CTrisilane_SI050_USENG.pdf";;
   ```
   
   I would say, that this normalization would be valuable.
   
   3. `testNonStandardPercentEncoding` -> very specific case. Might be ok to 
fail now. Dunnow (or add a specific workaround)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to