Hi all,

A couple weeks back I took a look at 72811[1].  The bug being that
parse_url() didn't accept IPv6 addresses without a scheme, like it did for
IPv4 addresses.  I attempted to patch the specific bug within the scope of
how parse_url() was processing URI's.  After opening a PR for the
resoution, Yasuo and Christoph both chimed in that perhaps replacing the
implementation with an re2c based parser would be better.  We found a
parser[2] that did almost everything necessary.  I took it and made it more
strictly adhere to RFC3986[3].

I have updated my original PR[4] and created a RFC[5] that aims to replace
the parsing of parse_url() to be more strict to RFC3986.  This will provide
a BC break, as explained in the RFC that at very least warrants some
discussion.  We had kicked around the idea on the PR of deprecating
parse_url, and creating a new function with the more-compliant parser, but
oped against it.

I'm looking for discussion on if a total replacement is the preferred way
to go about this, and if, we should be making parse_url() more standards
strict.  Since it today has many breaks with RFC3986 that provide
semi-reasonable parsing patterns.

--
Dave

[1] - https://bugs.php.net/bug.php?id=72811
[2] - https://github.com/staskobzar/url_parser_re2c
[3] - https://tools.ietf.org/html/rfc3986
[4] - https://github.com/php/php-src/pull/2079
[5] - https://wiki.php.net/rfc/replace_parse_url

Reply via email to