On 12/15/2022 7:34 AM, Derick Rethans wrote:
https://wiki.php.net/rfc/unicode_text_processing
A few quick thoughts:
The constructor will also convert the given text to Unicode Canonical Form.
By this do you mean Normalization Form C (NFC)? "Unicode Canonical Form"
isn't a phrase I'm familiar with.
Assuming so, are modified texts (e.g. via join, replaceText, reverse)
re-normalized?
---
The constructor will also strip out a BOM (Byte-Order-Mark) character, if
present.
This is also known as ZWNBSP (Zero Width No-Break Space). Will only a
leading instance be stripped? If so, how can someone search for it (or a
substring beginning with it) given that:
If an argument to any of the methods is listed as string|Text, passing in a
string value will have the same semantics as replacing the passed value with
new Text($string).
and all the search methods take `string|Text $search`.
---
Why is this being introduced directly into PHP core rather than first an
extension where it's easier to shake out the interface and behavior?
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php