Re: [PHP-DEV] Unicode support

2014-10-16 Thread Nicolas Grekas
Hello, I think that Rowan is right: PHP users need to manipulate grapheme clusters first (and code points in some rare situations). The fact that most of us live in a world were NFC composes all our characters only hides this reality. A typical use case is a template engine: nearly all string man

Re: [PHP-DEV] Unicode support

2014-10-15 Thread Aleksey Tulinov
On 15/10/14 15:58, Rowan Collins wrote: Rowan, What is confusing me is that i think you're seeing it as a major implementation defect. To avoid arguable implementations, i've made short example in Java: System.out.println(new StringBuffer("noël").reverse().toString()); It does produce string

Re: [PHP-DEV] Unicode support

2014-10-15 Thread Rowan Collins
Aleksey Tulinov wrote (on 15/10/2014): On 15/10/14 10:04, Rowan Collins wrote: Rowan, As I said at the top of my first post, the important thing is to capture what those requirements actually are. Just as you'd choose what array functions were needed if you were adding "array support" to a lan

Re: [PHP-DEV] Unicode support

2014-10-15 Thread Aleksey Tulinov
On 15/10/14 10:04, Rowan Collins wrote: Rowan, As I said at the top of my first post, the important thing is to capture what those requirements actually are. Just as you'd choose what array functions were needed if you were adding "array support" to a language. I'm sorry for not making mysel

Re: [PHP-DEV] Unicode support

2014-10-15 Thread Rowan Collins
>Good point. That's what i meant by border-line case. Could you possibly > >point me to a specific example of such false positive? I'm interested >in >well-formed UTF-8 string. I believe "noël" test is ill-formed UTF-8 >and >doesn't conform to shortest-form requirement. You're confusing two co

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Aleksey Tulinov
On 15/10/14 00:04, Rowan Collins wrote: Rowan, Back to combining characters, i dig the idea of introducing graphemes, but i think French person would write word "noël" using precomposed character. I'm using French keyboard at https://translate.google.com/#fr/. "ë" is Shift + "^", then "e", it p

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Aleksey Tulinov
On 14/10/14 23:48, Johannes Schlüter wrote: On Tue, 2014-10-14 at 23:18 +0300, Aleksey Tulinov wrote: Very good point. I'll give another example: is there a substring "s" in string "Maße"? If it's case-sensitive search, when there is no such substring, but if it's case-insensitive search, then

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Lester Caine
On 14/10/14 10:04, Aleksey Tulinov wrote: > 1. Is there a need for more Unicode support in PHP? > 2. What is currently missing in that regard? > 3. Is this a good place to ask such questions? I need to ask ... Is this discussion only about improving support for UTF8 content in PHP? What is the cu

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Rowan Collins
On 14/10/2014 20:51, Andrea Faulds wrote: If you went length in characters, you probably need to implement your own algorithm, as it really depends on your specific use case. I disagree, Unicode has very well-defined algorithms for these things, and the average PHP developer (or even PHP fram

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Rowan Collins
On 14/10/2014 21:18, Aleksey Tulinov wrote: Back to combining characters, i dig the idea of introducing graphemes, but i think French person would write word "noël" using precomposed character. I'm using French keyboard at https://translate.google.com/#fr/. "ë" is Shift + "^", then "e", it pro

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Johannes Schlüter
On Tue, 2014-10-14 at 23:18 +0300, Aleksey Tulinov wrote: > Very good point. I'll give another example: is there a substring "s" in > string "Maße"? If it's case-sensitive search, when there is no such > substring, but if it's case-insensitive search, then "ß" folds into "ss" > and substring "s"

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Aleksey Tulinov
On 14/10/14 21:01, Rowan Collins wrote: Rowan, As I've mentioned before, a lot of the time what people actually want to deal with is "grapheme clusters" - the kind of thing that you'd think of as a character if you were writing by hand. Most people, if asked the length of the string "noël", wo

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Andrea Faulds
On 14 Oct 2014, at 19:01, Rowan Collins wrote: > >> If you want to see a pragmatic, actually working, work-in-progress attempt >> at better PHP unicode support, see this: https://github.com/krakjoe/ustring > > It looks like a good prototype, but glancing at the documentation, I'm not > clear

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Rowan Collins
On 14/10/2014 14:50, Andrea Faulds wrote: 2. What is currently missing in that regard? Unicode string support. I know that was probably deliberately flippant, but I think there is a genuine question to be asked here. A lot of people talk about "Unicode support" like they talk about "XPath su

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Aleksey Tulinov
On 14/10/14 16:50, Andrea Faulds wrote: If you want to see a pragmatic, actually working, work-in-progress attempt at better PHP unicode support, see this: https://github.com/krakjoe/ustring It would add a UString class to PHP for Unicode strings. This would make Unicode text manipulation muc

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Chris Wright
On 14 October 2014 16:09, Aleksey Tulinov wrote: > On 14/10/14 14:00, Chris Wright wrote: > > Chris, > >>> Latter is referring to difficulties like "excess memory usage" and >>> "rewrite >>> the language". I'm developing an open-source Unicode implementation >>> library >>> (nunicode), and it does

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Aleksey Tulinov
On 14/10/14 14:00, Chris Wright wrote: Chris, Latter is referring to difficulties like "excess memory usage" and "rewrite the language". I'm developing an open-source Unicode implementation library (nunicode), and it doesn't consume any heap at all, it also works on native binary strings, as PH

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Andrea Faulds
On 14 Oct 2014, at 10:04, Aleksey Tulinov wrote: > I would appreciate if someone would point me to a good read or explain > collective opinion on this topic. I'm basically interested in the following > questions: > > 1. Is there a need for more Unicode support in PHP? Yes. > 2. What is curr

Re: [PHP-DEV] Unicode support

2014-10-14 Thread Chris Wright
On 14 October 2014 10:04, Aleksey Tulinov wrote: > Hey, > > I can't find any recent discussion in this mailing list on this topic, i > think that most close one is > http://grokbase.com/t/php/php-internals/143b6aevsp/unicode-strings. I was > also reading papers like that: > http://www.infoworld.co

[PHP-DEV] Unicode support

2014-10-14 Thread Aleksey Tulinov
Hey, I can't find any recent discussion in this mailing list on this topic, i think that most close one is http://grokbase.com/t/php/php-internals/143b6aevsp/unicode-strings. I was also reading papers like that: http://www.infoworld.com/article/2618358/application-development/php-5-4-emerges-

[PHP-DEV] Unicode support for *printf()

2006-12-11 Thread Antony Dovgal
Hello all. Attached is the patch which adds Unicode support to *printf() functions stack. We (Andrei and me) made several assumptions that are worth mentioning: sprintf() and vsprintf(): - use runtime_encoding when dealing with Unicode data. printf() and vprintf(): - the result data is conver