HTML 5 was adopted in 2014, over ten years ago. HTML 5 only supports the UTF-8 multi-byte character encoding.
It seems like there's still a lot of string functions that assume that a character is a single byte, and these may actually work as expected when dealing with Latin characters, but may fail unexpectedly if a sequence is more than one byte. Are there any use cases for PHP where **single-byte** characters are the norm? It seems that if everything on the Internet is multi-byte encoded now, then all of the PHP string functions should be multi-byte safe. The WHATWG Encoding Standard: https://encoding.spec.whatwg.org/ Also, according to Mozilla, "[The meta charset] attribute declares the document's character encoding. If the attribute is present, its value must be an ASCII case-insensitive match for the string "utf-8", because UTF-8 is the only valid encoding for HTML5 documents." https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#charset