[PHP-DEV] Re: [RFC] Unicode Text Processing

Paul Crovella Thu, 15 Dec 2022 09:15:26 -0800

On 12/15/2022 7:34 AM, Derick Rethans wrote:

https://wiki.php.net/rfc/unicode_text_processing


A few quick thoughts:

The constructor will also convert the given text to Unicode Canonical Form.

By this do you mean Normalization Form C (NFC)? "Unicode Canonical Form"isn't a phrase I'm familiar with.

Assuming so, are modified texts (e.g. via join, replaceText, reverse)re-normalized?

---

The constructor will also strip out a BOM (Byte-Order-Mark) character, if 
present.

This is also known as ZWNBSP (Zero Width No-Break Space). Will only aleading instance be stripped? If so, how can someone search for it (or asubstring beginning with it) given that:

If an argument to any of the methods is listed as string|Text, passing in a 
string value will have the same semantics as replacing the passed value with 
new Text($string).


and all the search methods take `string|Text $search`.

---

Why is this being introduced directly into PHP core rather than first anextension where it's easier to shake out the interface and behavior?


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

[PHP-DEV] Re: [RFC] Unicode Text Processing

Reply via email to