Hi Tim, Thank you again for the thorough review!
> The naming of these methods seems to be a little inconsistent. It should > either be: > > ->getHostForDisplay() > ->toStringForDisplay() > > or > > ->getDisplayHost() > ->toDisplayString() > > but not a mix between both of them. > Yes, I completely agree with your concern. I'm just not sure yet which combination I'd prefer. Probably the latter one? > Yes. Besides the remark above, my previous arguments still apply (e.g. > `with()`ers not being able to construct instances for subclasses, > requiring to override all of them). I'm also noticing that serialization > is unsafe with subclasses that add a `$__uri` property (or perhaps any > property at all?). > Hm, yes, you are right indeed that withers cannot really create new instances on their own because the whole URI string is needed to instantiate a new object... which is only accessible if it's reconstructed by swapping the relevant component with its new value. Please note that trying to serialize a $__uri property will result in an exception. 1. > > The `toDisplayString()` method that you mentioned above is not in the > RFC. Did you mean `toHumanFriendlyString()`? Which one is correct? > The toHumanFriendlyString() method stuck there from a previous version of the proposal, since then I converted it to toDisplayString(). > 2. > > The example output of the `$errors` array does not match the stub. It > contains a `failure` property, should that be `softError` instead? > The $softError property is also an outdated name: I recently changed it to $failure to be consistent with the wording that the WHATWG specification uses. > 3. > > The RFC states "When trying to instantiate a WHATWG Url via its > constructor, a Uri\InvalidUriException is thrown when parsing results in > a failure." > > What happens for Rfc3986 when passing an invalid URI to the constructor? > Will an exception be thrown? What will the error array contain? Is it > perhaps necessary to subclass Uri\InvalidUriException for use with > WhatWgUrl, since `$errors` is not applicable for 3986? > The first two questions are answered right at the top of the parsing section: "the constructor: It expects a URI, and optionally, a base URL in order to support reference resolution. When parsing is unsuccessful, a Uri\InvalidUriException is thrown." The $errors property will contain an empty array though, as you supposed. I don't see much problem with using the same exception in both cases, however I'm also fine with making the $errors property nullable in order to indicate that returning errors is not supported by the implementation triggering the error. > > 4. > > The RFC does not specify when `UninitializedUriException` is thrown. > That's a very good catch! I completely forgot about some exceptions. This one is used for indicating that an URI is not correctly initialized: when a URI instance is created without actually invoking the constructor, or the parse method, or __unserialize(), then any methods that try to use the internally stored URI will trigger this exception. > 5. > > The RFC does not specify when `UriOperationException` is thrown. > 6. > > Generally speaking I believe it would help understanding if you would > add a `/** @throws InvalidUriException */` to each of the methods in the > stub to make it clear which ones are able to throw (e.g. resolve(), or > the withers). It's harder to find this out from “English” rather than > “code” :-) > > Good idea! I've added the PHPDoc as well as created a dedicated "Exceptions" section. 7. > > In the “Component retrieval” section: Please add even more examples of > what kind of percent-decoding will happen. For example, it's important > to know if `%26` is decoded to `&` in a query-string. Or if `%3D` is > decoded to `=`. This really is the same case as with `%2F` in a path. > The explanation > Thanks for calling these cases out, I've significantly reworked the relevant sections. First of all, I added much more details to the general overview about percent-encoding: https://wiki.php.net/rfc/url_parsing_api#percent-encoding_decoding as well as extended the https://wiki.php.net/rfc/url_parsing_api#component_retrieval section with more information about the two component representations, and added a general clarification related to reserved characters. Additionally, the https://wiki.php.net/rfc/url_parsing_api#component_modification section makes it clear how percent-encoding is performed when the withers are used. After thinking about the question a lot, finally the current encoding-decoding rules seem logical to me, but please double-check them. It's easy to misinterpret such long and complex specifications. Long story short: when parsing an URI or modifying a component, RFC 3986 fails hard if an invalid character is found, while WHATWG implementation automatically percent-encodes it while also triggering a soft-error. While retrieving the "normalized-decoded" representation of a URI component, percent-decoding is performed when possible: - in case of RFC3986: reserved and invalid characters are not percent-decoded (only unreserved ones are) - in case of WHATWG: invalid characters and characters with special meaning (that fall into the percent-encode set of the given component) are not percent-decoded The relevant sections will give a little more reasoning why I went with these rules. "the URI is normalized (when applicable), and then the reserved > characters in the context of the given component are percent-decoded. > This means that only those reserved characters are percent-decoded that > are not allowed in a component. This behavior is needed to be able to > unambiguously retrieve components." > > alone is not clear to me. “reserved characters that are not allowed in a > component”. I assume this means that `%2F` (/) in a path will not be > decoded, but `%3F` (?) will, because a bare `?` can't appear in a path? > I hope that this question is also clear after my clarifications + the reconsidered logic. > 8. > > In the “Component retrieval” section: You compare the behavior of > WhatWgUrl and Rfc3986Uri. It would be useful to add something like: > > $url->getRawScheme() // does not exist, because WhatWgUrl always > normalizes the scheme > Done. > > to better point out the differences between the two APIs with regard to > normalization (it's mentioned, but having it in the code blocks would > make it more visible). > Done. > > 9. > > In the “Component Modification” section, the RFC states that WhatWgUrl > will automatically encode `?` and `#` as necessary. Will the same happen > for Rfc3986? Will the encoding of `#` also happen for the query-string > component? The RFC only mentions the path component. The above referenced sections will give a clear answer for this question as well. TLDR: after your message, I realized that automatic percent-encoding also triggers a (soft) error case for WHATWG, so I changed my mind with regards to Uri\Rfc3986\Uri, so it won't do any automatic percent-encoding. It's unfortunate, because this behavior is not consistent with WHATWG, but it's more consistent with the parsing rules of its own specification, where there are only hard errors, and there's no such thing as "automatic correction". > I'm also wondering if there are cases where the withers would not > round-trip, i.e. where `$url->withPath($url->getPath())` would not > result in the original URL? > I am currently not aware of any such situation... I even wrote about this aspect fairly long, because I think "roundtripability" is a very important attribute. Thank you for raising awareness of this! > > 10. > > Can you add examples where the authority / host contains IPv6 literals? > It would be useful to specifically show whether or not the square > brackets are returned when using the getters. It would also be > interesting to see whether or not IPv6 addresses are normalized (e.g. > shortening `2001:db8:0:0:0:0:0:1` to `2001:db8::1`). > Good idea again! I've added an example containing an IPv6 host at the very end of the component retrieval section. And yes, they will be enclosed within a [] pair as per the spec. It also surprised me, but IP address normalization is only performed by WHATWG during recomposition! But nowhere else... > > 11. > > In “Component Recomposition” the RFC states "The > Uri\Rfc3986\Uri::toString() returns the unnormalized URI string". > > Does this mean that toString() for Rfc3986 will always return the > original input? > Yes, effectively that's the case, only WHATWG modifies the input according to my knowledge. In the past, I had the impression that RFC 3986 also did a few changes, but then I had to realize that it was not the case after I had dug deep into the code of uriparser. > > 12. > > It would be useful to know whether or not the classes implement > `__debugInfo()` / how they appear when `var_dump()`ing them. > I've added an example. That's all I managed to write for now, but I'll try to answer the rest of the messages and feedback as soon as possible. :) Regards, Máté