On 14/03/2025 20:45, Máté Kocsis wrote:
Hi Ignace,
> All URI components - with the exception of the host - can be
retrieved in two formats:
I believe you mean - with the excepotion of the Port
Even though I specifically meant WHATWG's host that is only available in
only
one format, you are right, the port is never available in two formats.
So I've
changed the wording accordingly.
0 - It is a unfortunate that there's no IDNA support for RFC3986, I
understand the reasoning behind that decision but I was wondering if it
was possible to optin its use when the ext-intl extension is present ?
Good question, I think it's probably not the main concern. My specific
concern is that
RFC 3987 has around same length as RFC 3986, in a lot of cases it uses
the exact
wording of the initial RFC but changes URI to IRI, and of course adds the
IDNA specific parts. Maybe it's just me, but it's not easy to find it
out exactly what
has to be implemented above RFC 3986, and also, how it can be best achieved?
By extending the class for RFC 3986? Creating a totally separate class
that can
transform itself to an RFC 3986 URI? These and quite some other
questions have
to be answered first, which I would like to postpone.
1 - Does it means that if/when Rfc3986/Uri get Rfc3987 supports they
will also get a `Uri::toDisplayString` and `Uri::getHostForDisplay`
maybe this should be stated in the Futurscope ?
It's a question that I also asked from myself. For now, I'd say that
Rfc3986/Uri shouldn't have these methods, since it doesn't support any such
capabilities. But Rfc3986\Iri should likely have these toString methods.
4 - For consistency I would use toRawString and toString just like
it is
done for components.
I'm fine with this, I also think doing so would reasonably continue the
convention
getters do.
5 - Can the returned array from __debugInfo be used in a "normal"
method
like `toComponents` naming can be changed/improve to ease migration
from
parse_url or is this left for userland library ?
I intend to add the __debugInfo() method purely to help debugging.
Without this
method, even I had a hard time when trying to compare the expected vs actual
URIs in my tests.
But more importantly, sometimes the recomposed string is not enough to
have a
good understanding exactly what value each component has. For example
one can naively assume that the "mailto:kocsism...@php.net
<mailto:kocsism...@php.net>" URI has a
user(info) component of "kocsismate" and a hostname of "php.net <http://
php.net>" (I probably
also did so before reading the RFCs). The representation provided by
__debugInfo() can quickly highlight that "kocsism...@php.net
<mailto:kocsism...@php.net>" is the path in fact.
One could try to call the individual getters to find the needed
component, but having
such a method like __debugInfo() provides a much more clear picture
about the anatomy of
the URI.
But otherwise I don't know how useful this method would be. Is there
anything else
besides helping the migration?
Regards,
Máté
Thanks for the clarification.
I have other questions upon further readings:
1) around `Uri\UninitializedUriException` If I look at the behaviour of
`DatetimeImmutable` in the same scenario or a Userland object instead of
throwing an exception an error is thrown
see:
- https://3v4l.org/d4VrY
- https://3v4l.org/Wn7En
Shouldn't the URI feature follow the same path for consistency ? Instead
of throwing an exception it should throw an Error on uninitialized issue
at least.
2) around Normalization. In case of query normalization, sorting the
query string is not mention does it means that with the current feature
`http://example.com?foo=bar&foo=rab`
is different from
`http://example.com?foo=rab&foo=bar`