mstrewe opened a new issue, #1448: URL: https://github.com/apache/incubator-stormcrawler/issues/1448
The BasicUrlNormalizer will encode links if they are not already URL encoded. The Bug occurs when URL has encoded chars in smaller case like `'/Exhibitions/Detail/NjAxOA%3d%3d'`. (the URL `'/Exhibitions/Detail/NjAxOA%3D%3D'` is not affected) In BasicUrlNormalizer.java from line 145-150 the file of the URL gets unescaped and escaped again. After that the original file and the es-unes-caped file are compared. It will be `Exhibitions/Detail/NjAxOA%3d%3d == Exhibitions/Detail/NjAxOA%3D%3D` (Capital D) After that the original source URL will be reacreated (line 154) and results in 'Exhibitions/Detail/NjAxOA%253D%253D' Can be fixed if the statement in line 148 ``` if (!file.equals(file2)) { ``` will changed to ``` if (!file.toLowerCase().equals(file2.toLowerCase())) { ``` UpperCase doesnt matter. But now it does not -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@stormcrawler.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org