On 14/03/2025 20:45, Máté Kocsis wrote:
Hi Ignace,

      > All URI components - with the exception of the host - can be
    retrieved in two formats:

    I believe you mean - with the excepotion of the Port


Even though I specifically meant WHATWG's host that is only available in only one format, you are right, the port is never available in two formats. So I've
changed the wording accordingly.

    0 - It is a unfortunate that there's no IDNA support for RFC3986, I
    understand the reasoning behind that decision but I was wondering if it
    was possible to optin its use when the ext-intl extension is present ?


Good question, I think it's probably not the main concern. My specific concern is that RFC 3987 has around same length as RFC 3986, in a lot of cases it uses the exact
wording of the initial RFC but changes URI to IRI, and of course adds the
IDNA specific parts. Maybe it's just me, but it's not easy to find it out exactly what
has to be implemented above RFC 3986, and also, how it can be best achieved?
By extending the class for RFC 3986? Creating a totally separate class that can transform itself to an RFC 3986 URI? These and quite some other questions have
to be answered first, which I would like to postpone.


    1 - Does it means that if/when Rfc3986/Uri get Rfc3987 supports they
    will also get a `Uri::toDisplayString` and `Uri::getHostForDisplay`
    maybe this should be stated in the Futurscope ?


It's a question that I also asked from myself. For now, I'd say that
Rfc3986/Uri shouldn't have these methods, since it doesn't support any such
capabilities. But Rfc3986\Iri should likely have these toString methods.


    4 - For consistency I would use toRawString and toString just like
    it is
    done for components.


I'm fine with this, I also think doing so would reasonably continue the convention
getters do.


    5 - Can the returned array from __debugInfo be used in a "normal"
    method
    like `toComponents` naming can be changed/improve to ease migration
    from
    parse_url or is this left for userland library ?


I intend to add the __debugInfo() method purely to help debugging. Without this
method, even I had a hard time when trying to compare the expected vs actual
URIs in my tests.

But more importantly, sometimes the recomposed string is not enough to have a
good understanding exactly what value each component has. For example
one can naively assume that the "mailto:kocsism...@php.net <mailto:kocsism...@php.net>" URI has a user(info) component of "kocsismate" and a hostname of "php.net <http:// php.net>" (I probably
also did so before reading the RFCs). The representation provided by
__debugInfo() can quickly highlight that "kocsism...@php.net <mailto:kocsism...@php.net>" is the path in fact. One could try to call the individual getters to find the needed component, but having such a method like __debugInfo() provides a much more clear picture about the anatomy of
the URI.

But otherwise I don't know how useful this method would be. Is there anything else
besides helping the migration?

Regards,
Máté


Thanks for the clarification.

I have other questions upon further readings:

1) around `Uri\UninitializedUriException` If I look at the behaviour of `DatetimeImmutable` in the same scenario or a Userland object instead of throwing an exception an error is thrown

see:

- https://3v4l.org/d4VrY
- https://3v4l.org/Wn7En

Shouldn't the URI feature follow the same path for consistency ? Instead of throwing an exception it should throw an Error on uninitialized issue
at least.

2) around Normalization. In case of query normalization, sorting the query string is not mention does it means that with the current feature

`http://example.com?foo=bar&foo=rab`
is different from
`http://example.com?foo=rab&foo=bar`

Reply via email to