On 8/7/2013 10:05 PM, Michael McMahon wrote: > Resolvers seem to accept queries using trailing dots. > > eg nslookup www.oracle.com. > > or InetAddress.getByName("www.oracle.com."); > > The part of RFC3490 quoted below seems to me to be saying > that the empty label implied by the trailing dot is not regarded > as a label so that you don't end up calling toAscii() or toUnicode() > with an empty string. I don't think it's saying the trailing dot can't > be there. > It makes sense.
What's your preference to return for IDN.toASCII("www.oracle.com."), "www.oracle.com." or "www.oracle.com"? The current returned value is "www.oracle.com". I would like to reserve the behavior in this update. I think we are on same page soon. Thanks, Xuelei > Michael > > On 07/08/13 13:44, Xuelei Fan wrote: >> On 8/7/2013 12:06 AM, Matthew Hall wrote: >>> Trailing dots are allowed in plain DNS (thus almost surely in IDN), >>> and the single dot represents the root zone. So you have to be >>> careful making this sort of change to check the DNS RFCs first. >> That's the first question we need to answer, whether IDN allow tailling >> dots ("com."), zero-length root label ("."), and zero-length label ("", >> for example ""example..com")? >> >> Per the specification of IDN.toASCII(): >> ======================================= >> "ToASCII operation can fail. ToASCII fails if any step of it fails. If >> ToASCII operation fails, an IllegalArgumentException will be thrown. In >> this case, the input string should not be used in an internationalized >> domain name. >> >> A label is an individual part of a domain name. The original ToASCII >> operation, as defined in RFC 3490, only operates on a single label. This >> method can handle both label and entire domain name, by assuming that >> labels in a domain name are always separated by dots. ... >> >> Throws IllegalArgumentException - if the input string doesn't conform to >> RFC 3490 specification" >> >> Per the specification of RFC 3490: >> ================================== >> [section 2] >> "A label is an individual part of a domain name. Labels are usually >> shown separated by dots; for example, the domain name >> "www.example.com" is composed of three labels: "www", "example", and >> "com". (The zero-length root label described in [STD13], which can >> be explicit as in "www.example.com." or implicit as in >> "www.example.com", is not considered a label in this specification.)" >> >> "An "internationalized label" is a label to which the ToASCII >> operation (see section 4) can be applied without failing (with the >> UseSTD3ASCIIRules flag unset). ... >> Although most Unicode characters can appear in >> internationalized labels, ToASCII will fail for some input strings, >> and such strings are not valid internationalized labels." >> >> "An "internationalized domain name" (IDN) is a domain name in which >> every label is an internationalized label." >> >> [Section 4.1] >> "ToASCII consists of the following steps: >> >> ... >> >> 8. Verify that the number of code points is in the range 1 to 63 >> inclusive." >> >> >> Here are the questions: >> 1. whether "example..com" is an valid IDN? >> As dot is used as label separators, there are three labels, >> "example", "", "com". Per RFC 3490, "" is not a valid label. Hence, >> "example..com" is not a valid IDN. >> >> We need to address the issue in IDN. >> >> 2. whether "xyz." is an valid IDN? >> It's an gray area, I think. We can treat the trailing "." as root >> label, or a label separator. >> If the trailing "." is treated as label separator, "xyz." is invalid >> per RFC 3490. >> if the trailing "." is treated as root label, what's the expected >> return value of IDN.toASCII("xyz.")? I think the return value can be >> either "xyz." or "xyz". The current implementation returns "xyz". >> >> We may need not to update the implementation if tailing "." is >> treated as root label. >> >> 3. whether "." is an valid IDN? >> It's an gray area again, I think. >> As above, if the trailing "." is treated as root label, I think the >> return value can be either "." or "". The current implementation throws >> a StringIndexOutOfBoundsException. >> >> However, what empty domain name ("") really means? I would prefer to >> return "." for "." instead. >> >> We need to address the issue in IDN. >> >> >> Here comes the solution, the IDN.toASCII() returns: >> 1. "." for "."; >> 2. "xyz" for "xyz."; >> 3. IAE for "example..com". >> >> Does it make sense? >> >> Thanks, >> Xuelei >> >> >> On 8/7/2013 1:35 AM, Michael McMahon wrote: >>> I don't really understand the reason for the restriction in SNIHostName >>> But, I guess that is where it should be enforced if it is required. >>> >>> Michael. >>> >>> On 06/08/13 17:43, Dmitry Samersoff wrote: >>>> Xuelei, >>>> >>>> . (dot) is perfectly valid domain name and it means root domain so com. >>>> is valid domain name as well. >>>> >>>> It thinks to me that in context of methods your change we should ignore >>>> trailing dots, rather than throw exception. >>>> >>>> -Dmitry >>>> >>>> >>>> >>>> On 2013-08-06 15:44, Xuelei Fan wrote: >>>>> Hi, >>>>> >>>>> Please review the bug fix to strict the illegal input checking in IDN. >>>>> >>>>> webrev: http://cr.openjdk.java.net./~xuelei/8020842/webrev.00/ >>>>> >>>>> Here is two test cases, which are expected to get IAE. >>>>> >>>>> Case 1: >>>>> String host = IDN.toASCII(".", IDN.USE_STD3_ASCII_RULES); >>>>> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: >>>>> String index out of range: 0 >>>>> at java.lang.StringBuffer.charAt(StringBuffer.java:204) >>>>> at java.net.IDN.toASCIIInternal(IDN.java:279) >>>>> at java.net.IDN.toASCII(IDN.java:118) >>>>> >>>>> Case 2: >>>>> String host = IDN.toASCII("com.", IDN.USE_STD3_ASCII_RULES); >>>>> >>>>> Thanks, >>>>> Xuelei >>>>> >