[PHP-DEV][Discussion] Should All String Functions Become Multi-Byte Safe?

Nick Lockheart Sun, 11 Aug 2024 08:51:44 -0700


HTML 5 was adopted in 2014, over ten years ago. HTML 5 only supports
the UTF-8 multi-byte character encoding.


It seems like there's still a lot of string functions that assume that
a character is a single byte, and these may actually work as expected
when dealing with Latin characters, but may fail unexpectedly if a
sequence is more than one byte.

Are there any use cases for PHP where **single-byte** characters are
the norm?

It seems that if everything on the Internet is multi-byte encoded now,
then all of the PHP string functions should be multi-byte safe.


The WHATWG Encoding Standard:

https://encoding.spec.whatwg.org/

Also, according to Mozilla, "[The meta charset] attribute declares the
document's character encoding. If the attribute is present, its value
must be an ASCII case-insensitive match for the string "utf-8", because
UTF-8 is the only valid encoding for HTML5 documents."

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#charset

[PHP-DEV][Discussion] Should All String Functions Become Multi-Byte Safe?

Reply via email to