Hi
Am 2025-02-24 12:08, schrieb Nicolas Grekas:
The situation I'm telling about is when one will accept an argument
described as
function (\Uri\WhatWg\Url $url)
If the Url class is final, this signature means only one possible
implementation can ever be passed: the native one. Composition cannot
be
achieve because there's no type to compose.
Yes, that's the point: The behavior and the type are intimately tied
together. The Uri/Url classes are representing values, not services. You
wouldn't extend an int either. For DateTimeImmutable inheritance being
legal causes a ton of needless bugs (especially around serialization
behavior).
Fine-tuning the behavior provided by the RFC is what we might be most
interested in, but we should not forget that we also ship a type. By
making
For a given specification (RFC 3986 / WHATWG) there is exactly one
correct interpretation of a given URL. “Fine-tuning” means that you are
no longer following the specification.
the type non-final, we keep things open enough for userland to build on
it.
This works:
final class HttpUrl {
private readonly \Uri\Rfc3986\Uri $uri;
public function __construct(string $uri) {
$this->uri = new \Uri\Rfc3986\Uri($uri);
if ($this->uri->getScheme() !== 'http') {
throw new ValueError('Scheme must be http');
}
}
public function toRfc3986(): \Uri\Rfc3986\Uri {
return $this->uri;
}
}
Userland can easily build their convenience wrappers around the classes,
they just need to export them to the native classes which will then
guarantee that the result is fully validated and actually a valid
URI/URL. Keep in mind that the ext/uri extension will always be
available, thus users can rely on the native implementation.
By making the classes non-final, there will be one base type to build
upon
for userland.
(the alternative would be to define native UrlInterface, but that'd
increase complexity for little to no gain IMHO - althought that'd solve
my
main concern).
Mate already explained why a native UriInterface was intentionally
removed from the RFC in https://news-web.php.net/php.internals/126425.
The RFC is also missing whether __debugInfo returns raw or non-raw
components. Then, I'm wondering if we need this per-component break for
debugging at all? It might be less confusing (on this encoding aspect)
to
dump basically what __serialize() returns (under another key than __uri
of
course).
That would also work for me.
It can make sense to normalize a hostname, but not the path. My usual
example against normalizing the path is that SAML signs the *encoded*
URI instead of the payload and changing the case in percent-encoded
characters is sufficient to break the signature
I would be careful with this argument: signature validation should be
done
on raw bytes. Requiring an object to preserve byte-level accuracy while
the
very purpose of OOP is to provide abstractions might be conflicting.
The
signing topic can be solved by keeping the raw signed payload around.
Yes, the SAML signature behavior is wrong, but I did not write the SAML
specification. I just pointed out how it a possible use-case where
choosing the raw or normalized form depends on the component and where a
“get all components” function would be dangerous.
Best regards
Tim Düsterhus