On 12/15/2022 7:34 AM, Derick Rethans wrote:
https://wiki.php.net/rfc/unicode_text_processing

A few quick thoughts:

The constructor will also convert the given text to Unicode Canonical Form.

By this do you mean Normalization Form C (NFC)? "Unicode Canonical Form" isn't a phrase I'm familiar with.

Assuming so, are modified texts (e.g. via join, replaceText, reverse) re-normalized?

---

The constructor will also strip out a BOM (Byte-Order-Mark) character, if 
present.

This is also known as ZWNBSP (Zero Width No-Break Space). Will only a leading instance be stripped? If so, how can someone search for it (or a substring beginning with it) given that:

If an argument to any of the methods is listed as string|Text, passing in a 
string value will have the same semantics as replacing the passed value with 
new Text($string).

and all the search methods take `string|Text $search`.

---

Why is this being introduced directly into PHP core rather than first an extension where it's easier to shake out the interface and behavior?

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to