Hi

Am 2025-02-24 12:08, schrieb Nicolas Grekas:
The situation I'm telling about is when one will accept an argument
described as
function (\Uri\WhatWg\Url $url)

If the Url class is final, this signature means only one possible
implementation can ever be passed: the native one. Composition cannot be
achieve because there's no type to compose.

Yes, that's the point: The behavior and the type are intimately tied together. The Uri/Url classes are representing values, not services. You wouldn't extend an int either. For DateTimeImmutable inheritance being legal causes a ton of needless bugs (especially around serialization behavior).

Fine-tuning the behavior provided by the RFC is what we might be most
interested in, but we should not forget that we also ship a type. By making

For a given specification (RFC 3986 / WHATWG) there is exactly one correct interpretation of a given URL. “Fine-tuning” means that you are no longer following the specification.

the type non-final, we keep things open enough for userland to build on it.

This works:

    final class HttpUrl {
        private readonly \Uri\Rfc3986\Uri $uri;
        public function __construct(string $uri) {
            $this->uri = new \Uri\Rfc3986\Uri($uri);
            if ($this->uri->getScheme() !== 'http') {
                throw new ValueError('Scheme must be http');
            }
        }
        public function toRfc3986(): \Uri\Rfc3986\Uri {
            return $this->uri;
        }
    }

Userland can easily build their convenience wrappers around the classes, they just need to export them to the native classes which will then guarantee that the result is fully validated and actually a valid URI/URL. Keep in mind that the ext/uri extension will always be available, thus users can rely on the native implementation.

By making the classes non-final, there will be one base type to build upon
for userland.
(the alternative would be to define native UrlInterface, but that'd
increase complexity for little to no gain IMHO - althought that'd solve my
main concern).

Mate already explained why a native UriInterface was intentionally removed from the RFC in https://news-web.php.net/php.internals/126425.

The RFC is also missing whether __debugInfo returns raw or non-raw
components. Then, I'm wondering if we need this per-component break for
debugging at all? It might be less confusing (on this encoding aspect) to dump basically what __serialize() returns (under another key than __uri of
course).

That would also work for me.

It can make sense to normalize a hostname, but not the path. My usual
example against normalizing the path is that SAML signs the *encoded*
URI instead of the payload and changing the case in percent-encoded
characters is sufficient to break the signature


I would be careful with this argument: signature validation should be done on raw bytes. Requiring an object to preserve byte-level accuracy while the very purpose of OOP is to provide abstractions might be conflicting. The
signing topic can be solved by keeping the raw signed payload around.

Yes, the SAML signature behavior is wrong, but I did not write the SAML specification. I just pointed out how it a possible use-case where choosing the raw or normalized form depends on the component and where a “get all components” function would be dangerous.

Best regards
Tim Düsterhus

Reply via email to