Hello, While looking into reducing string allocation in URL constructors, I found the following somewhat puzzling:
// Throws MUE as expected: "no protocol: 1nvalid://resource" // Reason: "1nvalid" is not a valid scheme (cannot start with a digit) // See https://www.ietf.org/rfc/rfc2396.html#section-3.1 @Test(expected = MalformedURLException.class) public void shouldDetectInvalidProtocolName() throws MalformedURLException { new URL("1nvalid://resource"); } // This does not throw MUE, even when the protocol name is invalid @Test(expected = MalformedURLException.class) public void shouldDetectExplicitInvalidProtocolName() throws MalformedURLException { new URL("1nvalid", null, -1, "/resource", new StreamHandler()); } // Does not throw MUE even if RFC2396 only allows ascii characters (a-z and A-Z) @Test(expected = MalformedURLException.class) public void shouldDetectNonAsciiProtocol() throws MalformedURLException { new URL("øl://resource"); // "øl" handler registered via URLStreamHandlerProvider } A few observations: 1: Misleading error message for invalid protocol The error message "no protocol: 1nvalid://resource" is a bit misleading. Instead of "no protocol" it should perhaps say "invalid protocol" ? 2: Inconsistent rejection of invalid protocols URL rejects invalid URLs when parsing a spec string, but not when given an explicit protocol parameter. (For me this feels inconsistent and also causes a slight practical complication of my refactoring effort, since it needs to carry over the two different behaviors when dealing with protocol names) 3 Non-ascii protocol names are allowed: isValidProtocol does not reject non-ascii unicode characters so it does not strictly follow RFC2396 Given that this class is @since 1.0 there is probably some resistance to changing behaviour in this area? WDYT? Cheers, Eirik.