On Sun, Jul 7, 2024, at 12:40, Rob Landers wrote: > On Sun, Jul 7, 2024, at 11:13, Máté Kocsis wrote: >> Hi Ignace, >> >>> As far as I understand it, if this RFC were to pass as is it will model >>> PHP URLs to the WHATWG specification. While this specification is >>> getting a lot of traction lately I believe it will restrict URL usage in >>> PHP instead of making developer life easier. While PHP started as a >>> "web" language it is first and foremost a server side general purpose >>> language. The WHATWG spec on the other hand is created by browsers >>> vendors and is geared toward browsers (client side) and because of >>> browsers history it restricts by design a lot of what PHP developers can >>> currently do using `parse_url`. In my view the `Url` class in >>> PHP should allow dealing with any IANA registered scheme, which is not >>> the case for the WHATWG specification. >> >> Supporting IANA registered schemes is a valid request, and is definitely >> useful. >> However, I think this feature is not strictly required to have in the >> current RFC. >> Anyone we needs to support features that are not offered by the WHATWG >> standard can still rely on parse_url(). And of course, we can (and should) >> add >> support for other standards later. If we wanted to do all these in the same >> RFC, then the scope of the RFC would become way too large IMO. That's why I >> opt for incremental improvements. > > It's also worth pointing out (as another reason not to do this) is that IANA > may-or-may not be valid in the current network. For example, TOR, Handshake, > IPFS, Freenet, etc. all have their own DNS schemes and do not (usually) use > IANA registered schemes, and many people create sites that cater to those > networks. > >> >> Besides, I fail to see why a WHATWG compliant parser wouldn't be useful in >> PHP: >> yes, PHP is server side, but it still interacts with browsers very heavily. >> Among other >> use-cases I cannot yet image, the major one is most likely validating >> user-supplied URLs >> for opening in the browser. As far as I see the situation, currently there >> is no acceptably >> reliable possibility to decide whether a URL can be opened in browsers or >> not. > > Looking at the spec for WHATWG, it looks like `example%2Ecom` will be parsed > as a valid URL, and transformed to `example.com`, while this doesn't > currently happen in parse_url(): > > https://3v4l.org/NtqQm > > I don't know if that may be an issue, but might be if you are expecting the > string to remain URL encoded. > >> >>> - parse_url and parse_str predates RFC3986 >>> - URLSearchParans was ratified before PSR-7 BUT the first implementation >>> landed a year AFTER PSR-7 was released and already implemented. >> >> Thank you for the historical context! >> >> Based on your and others' feedback, it has now become clear for me that >> parse_url() >> is still useful and ext/url needs quite some additional capabilities until >> this function >> really becomes superfluous. That's why it now seems to me that the behavior >> of >> parse_url() could be leveraged in ext/url so that it would work with a >> Url/Url class (e.g. >> we had a PhpUrlParser class extending the Url/UrlParser, or a >> Url\Url::fromPhpParser() >> method, depending on which object model we choose. Of course the names are >> TBD). >> >>> For all these arguments I would keep the proposed `Url` free of all >>> these concerns and lean toward a nullable string for the query string >>> representation. And defer this debate to its own RFC regarding query >>> string parsing handling in PHP. >> >> My WIP implementation still uses nullable properties and return types. I >> only changed those >> when I wrote the RFC. Since I see that PSR-7 compatibility is very low prio >> for everyone >> involved in the discussion, then I think making these types nullable is >> fine. It was neither my >> top prio, but somewhere I had to start the object design, so I went with >> this. > > The spec contains elements and their types. It would be good to adhere to the > spec (simplifies documentation): > > 1. scheme may be null or empty string > 2. port may be null > 3. path is never null, but may be empty string > 4. query may be null > 5. fragment may be null > 6. user/password may be null (to differentiate between an empty password or > no password) > 7. host may be null (for relative URLs > >> >> Again, thank you for your constructive criticism. >> >> Regards, >> Máté > > — Rob
Here's a list of examples worth adding to the RFC: //example.com? ftp://u...@example.com/path/to/ffile https://user:@example.com https://user:pass@example%2Ecom/?something=other&bool#heading etc. — Rob