On Sun, Jul 7, 2024, at 11:13, Máté Kocsis wrote: > Hi Ignace, > >> As far as I understand it, if this RFC were to pass as is it will model >> PHP URLs to the WHATWG specification. While this specification is >> getting a lot of traction lately I believe it will restrict URL usage in >> PHP instead of making developer life easier. While PHP started as a >> "web" language it is first and foremost a server side general purpose >> language. The WHATWG spec on the other hand is created by browsers >> vendors and is geared toward browsers (client side) and because of >> browsers history it restricts by design a lot of what PHP developers can >> currently do using `parse_url`. In my view the `Url` class in >> PHP should allow dealing with any IANA registered scheme, which is not >> the case for the WHATWG specification. > > Supporting IANA registered schemes is a valid request, and is definitely > useful. > However, I think this feature is not strictly required to have in the current > RFC. > Anyone we needs to support features that are not offered by the WHATWG > standard can still rely on parse_url(). And of course, we can (and should) add > support for other standards later. If we wanted to do all these in the same > RFC, then the scope of the RFC would become way too large IMO. That's why I > opt for incremental improvements.
It's also worth pointing out (as another reason not to do this) is that IANA may-or-may not be valid in the current network. For example, TOR, Handshake, IPFS, Freenet, etc. all have their own DNS schemes and do not (usually) use IANA registered schemes, and many people create sites that cater to those networks. > > Besides, I fail to see why a WHATWG compliant parser wouldn't be useful in > PHP: > yes, PHP is server side, but it still interacts with browsers very heavily. > Among other > use-cases I cannot yet image, the major one is most likely validating > user-supplied URLs > for opening in the browser. As far as I see the situation, currently there is > no acceptably > reliable possibility to decide whether a URL can be opened in browsers or not. Looking at the spec for WHATWG, it looks like `example%2Ecom` will be parsed as a valid URL, and transformed to `example.com`, while this doesn't currently happen in parse_url(): https://3v4l.org/NtqQm I don't know if that may be an issue, but might be if you are expecting the string to remain URL encoded. > >> - parse_url and parse_str predates RFC3986 >> - URLSearchParans was ratified before PSR-7 BUT the first implementation >> landed a year AFTER PSR-7 was released and already implemented. > > Thank you for the historical context! > > Based on your and others' feedback, it has now become clear for me that > parse_url() > is still useful and ext/url needs quite some additional capabilities until > this function > really becomes superfluous. That's why it now seems to me that the behavior of > parse_url() could be leveraged in ext/url so that it would work with a > Url/Url class (e.g. > we had a PhpUrlParser class extending the Url/UrlParser, or a > Url\Url::fromPhpParser() > method, depending on which object model we choose. Of course the names are > TBD). > >> For all these arguments I would keep the proposed `Url` free of all >> these concerns and lean toward a nullable string for the query string >> representation. And defer this debate to its own RFC regarding query >> string parsing handling in PHP. > > My WIP implementation still uses nullable properties and return types. I only > changed those > when I wrote the RFC. Since I see that PSR-7 compatibility is very low prio > for everyone > involved in the discussion, then I think making these types nullable is fine. > It was neither my > top prio, but somewhere I had to start the object design, so I went with this. The spec contains elements and their types. It would be good to adhere to the spec (simplifies documentation): 1. scheme may be null or empty string 2. port may be null 3. path is never null, but may be empty string 4. query may be null 5. fragment may be null 6. user/password may be null (to differentiate between an empty password or no password) 7. host may be null (for relative URLs > > Again, thank you for your constructive criticism. > > Regards, > Máté — Rob