On 27/10/2022 4:26 pm, Alan Bateman wrote:
On 26/10/2022 23:53, Peter Firmstone wrote:
The change will have some performance impact, by requiring redundant parsing.

Just thought I'd mention it, in case it hasn't been thought of. If you do an internet search there are other implementations of RFC3986 in java also.

https://github.com/pfirmstone/JGDMS/blob/e4a5012e71fd9a61b6e1e505f07e6c5358a4ccbc/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/net/Uri.java#L1966

We have a strict URI 3986 implementation, which we use to create all URL instances from.

If your parser is using the one-arg URL constructor to create the URL then it will be parsed again, so you may already have duplicate parsing.  That said, there may be an argument that libraries should be able to do their own parsing and continue to construct a URL from its components with the non-validating constructors.

You are right, it was over 10 years ago when I originally wrote it, I'm a little rusty on the reason why I chose the single argument constructor, it's possible I would do it differently if implementing it today.  I might have been worried about what might break at the time.  We did have a developer in a downstream project who used a URN and a custom URLHandler, made possible by the change to RFC3986.

I think today we could safely change it to a non parsing constructor.


As I'm sure you know, changing URI to strictly implement RFC 3986 is not a compatible move. It was attempted in JDK 6 but had to backed out quickly as it caused widespread breakage. Hierarchical URIs using registry based authority components was one of significant issues. There has been exploration and prototypes since then to try to find a direction but there isn't a proposal right now.

The performance cost of retaining compatibility was a problem as we had to rely on DNS and file system access to determine equality as java.net.URI wasn't suitable, we wanted a normalized form, with guaranteed equality without network or file system access, so we decided to break compatibility, as the tradeoff was significantly in performance's favour.

It's used throughout our software, there's a URI3986ClassLoader, which extends SecureClassLoader, it's also used in unicast discovery of services (over network) and it's also used for Policy decisions...

https://github.com/pfirmstone/JGDMS/blob/trunk/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/net/RFC3986URLClassLoader.java

https://github.com/pfirmstone/JGDMS/blob/trunk/JGDMS/jgdms-platform/src/main/java/net/jini/core/discovery/LookupLocator.java

In Policy to replace CodeSource#implies functionality:

https://github.com/pfirmstone/JGDMS/blob/e4a5012e71fd9a61b6e1e505f07e6c5358a4ccbc/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/security/URIGrant.java#L132

Historically our software was horribly impacted by URL#equals and URL#hashCode and we went to a great deal of trouble to eliminate all such calls and replace them with Uri.   Which is why Uri is immutable, normalized and not Serializable.

We utilised bitshift operations for case conversion, after String methods became hotspots.

https://github.com/pfirmstone/JGDMS/blob/trunk/JGDMS/jgdms-platform/src/main/java/org/apache/river/api/net/URIEncoderDecoder.java

I suspect the best solution might be to leave java.net.URI alone and write a new implementation that isn't compatible, then eventually deprecate and remove java.net.URI.   Tip: Don't make it Serializable, let people Serialize the string form instead.

Regards,

Peter.

Reply via email to