Hi Tim & all, > On Mar 21, 2025, at 06:22, Tim Düsterhus <t...@bastelstu.be> wrote: > > Am 2025-03-18 18:48, schrieb Paul M. Jones: >> $iriPath = '/heads/' . rawurlencode($val) . '/tails/'); >> assert($iriPath === '/heads/fü bar/tails/'; // false > > From my reading of RFC 3987 that result is incorrect. The space is neither > listed as `iunreserved`, not as `sub-delims`, thus isn't a valid `ipchar`. > Thus the space needs to be encoded as %20 for IRIs as well. The same mistake > applies to the reference userland implementation below.
Agreed; the naive implementation would need to less naive and pay closer attention to the ABNF for `ucschar` and `ipchar` in the spec. Along those lines, I think there might need to be two additional changes/additions to help with encoding for RFC 3987 and WHATWG-URL component values: - `http_build_query()` would need PHP_QUERY_3987 and PHP_QUERY_WHATWG flags and corresponding logic (or entirely new functions); and - `parse_str()` would need a corresponding `mb_parse_str()`. -- pmj