massdosage commented on code in PR #629: URL: https://github.com/apache/httpcomponents-client/pull/629#discussion_r2021437718
########## httpclient5/src/test/java/org/apache/hc/client5/http/psl/TestPublicSuffixMatcher.java: ########## @@ -284,14 +284,14 @@ void testGetDomainRootPublicSuffixList() { checkPublicSuffix("shishi.中国", "shishi.中国"); checkPublicSuffix("中国", null); // Same as above, but punycoded. - checkPublicSuffix("xn--85x722f.com.cn", "xn--85x722f.com.cn"); - checkPublicSuffix("xn--85x722f.xn--55qx5d.cn", "xn--85x722f.xn--55qx5d.cn"); - checkPublicSuffix("www.xn--85x722f.xn--55qx5d.cn", "xn--85x722f.xn--55qx5d.cn"); - checkPublicSuffix("shishi.xn--55qx5d.cn", "shishi.xn--55qx5d.cn"); + checkPublicSuffix("xn--85x722f.Com.Cn", "食狮.com.cn"); + checkPublicSuffix("xn--85x722f.xn--55qx5d.CN", "食狮.公司.cn"); + checkPublicSuffix("www.xn--85x722f.xn--55qx5d.cn", "食狮.公司.cn"); + checkPublicSuffix("shishi.xn--55qx5d.cn", "shishi.公司.cn"); checkPublicSuffix("xn--55qx5d.cn", null); - checkPublicSuffix("xn--85x722f.xn--fiqs8s", "xn--85x722f.xn--fiqs8s"); - checkPublicSuffix("www.xn--85x722f.xn--fiqs8s", "xn--85x722f.xn--fiqs8s"); - checkPublicSuffix("shishi.xn--fiqs8s", "shishi.xn--fiqs8s"); + checkPublicSuffix("xn--85x722f.xn--fiqs8s", "食狮.中国"); + checkPublicSuffix("www.xn--85x722f.xn--fiqs8s", "食狮.中国"); + checkPublicSuffix("shishi.xn--fiqs8s", "shishi.中国"); Review Comment: I don't see anywhere in the standard that says if one pass Punycode in one should expect to get Unicode out. The line you quote above I think comes from the "[Entry Specification](https://github.com/publicsuffix/list/wiki/Format#entry-specification)" which defines the layout of the PSL _file_, not how it behaves. The Algorithm is defined later on in that page under https://github.com/publicsuffix/list/wiki/Format#algorithm but it doesn't specifically call Punycode out. My understanding of the set of unit tests that they provide is exactly to avoid inconsistency, if one's implementation behaves the same as theirs and has the same results as the unit tests then it's correct. We can see they purposefully added this behaviour a long time ago via this commit https://github.com/publicsuffix/list/commit/ddc97474bc8d0de6b70de6ac37125a371e6df439#diff-7ff3771a2abbfd9f8dfc636e6fd2ba9ebb72f59f791ed6df380066c66a9f4179R28. There is a comment there that says "The EffectiveTLDService always gives back punycoded labels." which is the behaviour we see in the unit tests. I agree that it's not all very clear but I take the fact that they provide a set of tests which they run against incoming contributions to be what they consider "correct" and anything claiming to implement the standard should behave the same way for the same input. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org