Hi Nick, > On Aug 16, 2024, at 6:11 PM, Nick Lockheart <li...@ageofdream.com> wrote: > > I wanted to reply generally to this and not to any person in > particular, as I'm the one who started the thread. > > I used the rather broad title "Should All String Functions Become > Multi-Byte Safe" because there are many smaller related topics, but my > intention was to discuss multi-byte in general, and see if there was > some consensus on action items that could have a more limited scope/RFC > for that task. > > My overall intent and goal was to make PHP safer against multi-byte > attacks by providing developers with tools that could become best > practices for dealing with user input stings, the same way we had > mysql_real_escape_string, and then PDO prepared statements for SQL. > > There's a lot of potential pitfalls for dealing with Unicode input, and > there are some best practices per the Unicode Consortium that I'm not > sure how to implement in PHP, and it seems that since everyone needs > them, they might be better as a shared library in core. > > For example, there should be a function that removes unassigned code > points. > > There should also be a function that removes "scripts" (as defined by > Unicode). > > We should have an easy way to remove private use code points (unless > you're running a Star Trek fan site and really do need Klingon). > > And the default replacement character for `mb_scrub` shouldn't be `?`. > > Each of these and other ideas could be part of an RFC, or we could > brainstorm a Unicode built-in class that handles lots of the common use > cases. > > Having a team-built and audited Unicode class would benefit almost > everyone using PHP.
My suggestion — take it or leave it — is to create a GitHub repo for your own RFCs and start writing your RFC there "in the open." Add the code for your implementation to the repo, add a discussion forum to allow really interested parties to participate, and send an invite on this list to those who are really interested to discuss, comment on the RFC, and even offer PRs. Then when everyone participating at your repo thinks the RFC is fully-baked, bring it back to the list here to discuss. Doing it that way will — unlike just discussing on the list — enable comments made in the forum a place to be captured and converted into text and implementation visible for everyone to see, and really motivated people can even submit PRs to your RFC in order to spread the load of writing a good RFC. #jmtcw #fwiw -Mike