On 28/06/2024 22:06, Máté Kocsis wrote:
Hi Everyone,
I've been working on a new RFC for a while now, and time has come to
present it to a wider audience.
Last year, I learnt that PHP doesn't have built-in support for parsing
URLs according to any well established standards (RFC 1738 or the WHATWG
URL living standard), since the parse_url() function is optimized for
performance instead of correctness.
In order to improve compatibility with external tools consuming URLs
(like browsers), my new RFC would add a WHATWG compliant URL parser
functionality to the standard library. The API itself is not final by
any means, the RFC only represents how I imagined it first.
You can find the RFC at the following link:
https://wiki.php.net/rfc/url_parsing_api
<https://wiki.php.net/rfc/url_parsing_api>
Regards,
Máté
As a maintainer of a PHP userland URI toolkit I have a couple of
questioms/remarks on the proposal. Fist, I look forward for finally
having a real Url parser AND validator in PHP core. Any effort on that
direction is always a welcomed good news.
As far as I understand it, if this RFC were to pass as is it will model
PHP URLs to the WHATWG specification. While this specification is
getting a lot of traction lately I believe it will restrict URL usage in
PHP instead of making developer life easier. While PHP started as a
"web" language it is first and foremost a server side general purpose
language. The WHATWG spec on the other hand is created by browsers
vendors and is geared toward browsers (client side) and because of
browsers history it restricts by design a lot of what PHP developers can
currently do using `parse_url`. In my view the `Url` class in
PHP should allow dealing with any IANA registered scheme, which is not
the case for the WHATWG specification.
Therefore, I would rather suggest we ALSO include support for RFC3986
and RFC3987 specification properly and give both specs a go (at the same
time!) and a clear way to instantiate your `Url` with one or the other spec.
In clear, my ideal situation would be to add to the parser at least 2
named constructors `UrlParser::fromRFC3986` and `UrlParser::fromWHATWG`
or something similar (name can be changed or improved).
While this is an old article by Daniel Stenberg
(https://daniel.haxx.se/blog/2017/01/30/one-url-standard-please/), it
conveys with more in depth analysis my issues with the WHATWG spec and
its usage in PHP if it were to be use as the ONLY available parser in
PHP core for URL.
the PSR-7 relation is also unfortunate from my POV: PSR-7 UriInterface
is designed to be at its core an HTTP URI representation (so it shares
the same type of issue as the WHATWG spec!) meaning in absence of a
scheme it falls back to the HTTP scheme validation. This is why the
interface can forgone any nullable component because the HTTP spec
allows it, other schemes do not. For instance the FTP scheme prohibits
the presence of the query and fragment components which means they MUST
be `null` in that case.
By removing PSR-7 constraints we could add
- the `Url::(get|to)Components` method: it would mimics `parse_url`
returned value and as such ease migration from `parse_url`
- the `Url::getUsername` and `Url::getPassword` to access the username
and password component individually. You would still use
the `withUserInfo` method to update them but you give the developer the
ability to access both components directly from the `Url` object.
These additions would remove the need for
- `UrlParser::parseUrlToArray`
- `UrlParser::parseUrlComponent`
- `UrlComponent` Enum
Cheers,
Ignace