Hi On 8/11/24 17:50, Nick Lockheart wrote:
It seems like there's still a lot of string functions that assume that a character is a single byte, and these may actually work as expected when dealing with Latin characters, but may fail unexpectedly if a sequence is more than one byte.
PHP's strings are byte-strings containing arbitrary sequences of bytes. Unless you specifically select functions that interpret the byte-strings as something else, you get a byte-string interpretation. There is nothing unexpected about that.
Are there any use cases for PHP where **single-byte** characters are the norm?
Dealing with binary formats.
It seems that if everything on the Internet is multi-byte encoded now, then all of the PHP string functions should be multi-byte safe.
The premise is false. Everything on the Internet is byte-strings (also called "octet-string").
-------- You might be interested in https://externals.io/message/119149#119149. Best regards Tim Düsterhus