Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API

Rob Landers Sun, 07 Jul 2024 04:01:15 -0700


On Sun, Jul 7, 2024, at 12:40, Rob Landers wrote:
> On Sun, Jul 7, 2024, at 11:13, Máté Kocsis wrote:
>> Hi Ignace,
>> 
>>> As far as I understand it, if this RFC were to pass as is it will model
>>> PHP URLs to the WHATWG specification. While this specification is
>>> getting a lot of traction lately I believe it will restrict URL usage in
>>> PHP instead of making developer life easier. While PHP started as a
>>> "web" language it is first and foremost a server side general purpose
>>> language. The WHATWG spec on the other hand is created by browsers
>>> vendors and is geared toward browsers (client side) and because of
>>> browsers history it restricts by design a lot of what PHP developers can
>>> currently do using `parse_url`. In my view the `Url` class in
>>> PHP should allow dealing with any IANA registered scheme, which is not
>>> the case for the WHATWG specification.
>> 
>> Supporting IANA registered schemes is a valid request, and is definitely 
>> useful.
>> However, I think this feature is not strictly required to have in the 
>> current RFC.
>> Anyone we needs to support features that are not offered by the WHATWG
>> standard can still rely on parse_url(). And of course, we can (and should) 
>> add
>> support for other standards later. If we wanted to do all these in the same
>> RFC, then the scope of the RFC would become way too large IMO. That's why I
>> opt for incremental improvements.
> 
> It's also worth pointing out (as another reason not to do this) is that IANA 
> may-or-may not be valid in the current network. For example, TOR, Handshake, 
> IPFS, Freenet, etc. all have their own DNS schemes and do not (usually) use 
> IANA registered schemes, and many people create sites that cater to those 
> networks.
> 
>> 
>> Besides, I fail to see why a WHATWG compliant parser wouldn't be useful in 
>> PHP:
>> yes, PHP is server side, but it still interacts with browsers very heavily. 
>> Among other
>> use-cases I cannot yet image, the major one is most likely validating 
>> user-supplied URLs
>> for opening in the browser. As far as I see the situation, currently there 
>> is no acceptably
>> reliable possibility to decide whether a URL can be opened in browsers or 
>> not.
> 
> Looking at the spec for WHATWG, it looks like `example%2Ecom` will be parsed 
> as a valid URL, and transformed to `example.com`, while this doesn't 
> currently happen in parse_url():
> 
> https://3v4l.org/NtqQm
> 
> I don't know if that may be an issue, but might be if you are expecting the 
> string to remain URL encoded.
> 
>> 
>>> - parse_url and parse_str predates RFC3986
>>> - URLSearchParans was ratified before PSR-7 BUT the first implementation
>>> landed a year AFTER PSR-7 was released and already implemented.
>> 
>> Thank you for the historical context!
>> 
>> Based on your and others' feedback, it has now become clear for me that 
>> parse_url()
>> is still useful and ext/url needs quite some additional capabilities until 
>> this function
>> really becomes superfluous. That's why it now seems to me that the behavior 
>> of
>> parse_url() could be leveraged in ext/url so that it would work with a 
>> Url/Url class (e.g.
>> we had a PhpUrlParser class extending the Url/UrlParser, or a 
>> Url\Url::fromPhpParser()
>> method, depending on which object model we choose. Of course the names are 
>> TBD).
>> 
>>> For all these arguments I would keep the proposed `Url` free of all
>>> these concerns and lean toward a nullable string for the query string
>>> representation. And defer this debate to its own RFC regarding query
>>> string parsing handling in PHP.
>> 
>> My WIP implementation still uses nullable properties and return types. I 
>> only changed those
>> when I wrote the RFC. Since I see that PSR-7 compatibility is very low prio 
>> for everyone
>> involved in the discussion, then I think making these types nullable is 
>> fine. It was neither my
>> top prio, but somewhere I had to start the object design, so I went with 
>> this.
> 
> The spec contains elements and their types. It would be good to adhere to the 
> spec (simplifies documentation):
> 
>  1. scheme may be null or empty string
>  2. port may be null
>  3. path is never null, but may be empty string
>  4. query may be null
>  5. fragment may be null
>  6. user/password may be null (to differentiate between an empty password or 
> no password)
>  7. host may be null (for relative URLs
> 
>> 
>> Again, thank you for your constructive criticism.
>> 
>> Regards,
>> Máté
> 
> — Rob


Here's a list of examples worth adding to the RFC:

//example.com?
ftp://[email protected]/path/to/ffile
https://user:@example.com
https://user:pass@example%2Ecom/?something=other&bool#heading

etc.

— Rob

Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API

Reply via email to