Hi Nick,

As a developer who often deals with binary data (like bencode, ipv6 addresses and my own hacks for multibyte arithmetic) I would prefer that functions and syntaxes that allow me to work with bytes keep working with bytes, not characters or code points. So the closest solution would be separate binary/text strings, but then we have PHP6 all over again. Maybe this time it might work in some form, who knows.

On 8/11/24 18:50, Nick Lockheart wrote:

HTML 5 was adopted in 2014, over ten years ago. HTML 5 only supports
the UTF-8 multi-byte character encoding.

It seems like there's still a lot of string functions that assume that
a character is a single byte, and these may actually work as expected
when dealing with Latin characters, but may fail unexpectedly if a
sequence is more than one byte.

Are there any use cases for PHP where **single-byte** characters are
the norm?

It seems that if everything on the Internet is multi-byte encoded now,
then all of the PHP string functions should be multi-byte safe.


The WHATWG Encoding Standard:

https://encoding.spec.whatwg.org/

Also, according to Mozilla, "[The meta charset] attribute declares the
document's character encoding. If the attribute is present, its value
must be an ASCII case-insensitive match for the string "utf-8", because
UTF-8 is the only valid encoding for HTML5 documents."

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#charset

Reply via email to