Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Daniel Convissor
On Thu, Jun 22, 2006 at 09:15:23PM -0700, Sara Golemon wrote: > utf16 of php's internal encoding. Big or Little Endian? Thanks, --Dan -- T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y data intensive web and database programming http://www.Analy

Re: [PHP-DEV] TextIterator changes

2006-06-23 Thread Andrei Zmievski
Sean (on IRC) convinced me that something called *Iterator had better implement Iterator interface (which TextIterator currently does not). So changing method signatures is out of the question. Towards that, the current functions will stay as they are, but I'll have to add current_offset() (for

Re: [PHP-DEV] Re: TextIterator changes

2006-06-23 Thread Michael Wallner
Andrei Zmievski wrote: TextIterator does not implement Iterator interface, only Traversable. It just happens to have functions of the same name as Iterator. Ah, okay. That leaves the picky OO strictness I personally don't like. I think we should give PHP some freedom back in this area. Regard

Re: [PHP-DEV] Re: TextIterator changes

2006-06-23 Thread Andrei Zmievski
TextIterator does not implement Iterator interface, only Traversable. It just happens to have functions of the same name as Iterator. -Andrei On Jun 23, 2006, at 2:16 PM, Michael Wallner wrote: Andrei Zmievski wrote: I am working on implementing BreakIterator API [1]. I considered two approa

[PHP-DEV] Re: TextIterator changes

2006-06-23 Thread Michael Wallner
Andrei Zmievski wrote: I am working on implementing BreakIterator API [1]. I considered two approaches: making a separate class or merging the API into the existing TextIterator. Having a separate class would be a bit cleaner, but I can see people wanting to use it in foreach(), and since TextI

[PHP-DEV] TextIterator changes

2006-06-23 Thread Andrei Zmievski
I am working on implementing BreakIterator API [1]. I considered two approaches: making a separate class or merging the API into the existing TextIterator. Having a separate class would be a bit cleaner, but I can see people wanting to use it in foreach(), and since TextIterator already provide

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andrei Zmievski
Especially since the UTF-16 internal representation may be little- or big-endian, depending on the platform. -Andrei On Jun 23, 2006, at 11:31 AM, Andi Gutmans wrote: Nah I didn't mean to get back to that discussion. I was thinking more of a binary dump of info (e.g. session-like stuff) or s

RE: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andi Gutmans
Nah I didn't mean to get back to that discussion. I was thinking more of a binary dump of info (e.g. session-like stuff) or shooting it over the network. But I agree with Andrei, it's really not a problem to just use one of those methods. > -Original Message- > From: Sara Golemon [mailto

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Sara Golemon
The only way they can get at the internal UTF-16 representation is via unicode_encode($uni, 'UTF-16') which will return a binary UTF-16 string. In that case, strlen() will work just as well. Hmm, I was thinking we might have some binary write function which would do that automagically. I think

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andrei Zmievski
Really? I think it's very rare that someone'd want to get at the internals of a Unicode string. -Andrei On Jun 22, 2006, at 11:44 PM, Andi Gutmans wrote: Hmm, I was thinking we might have some binary write function which would do that automagically. I think it'd be worth it. -Origi

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Andrei Zmievski
There already is such a function: strlen(unicode_encode($string, "UTF-16BE")); I think wanting to have access to internal representation of Unicode strings is an extremely rare operation in any case. -Andrei On Jun 23, 2006, at 12:16 AM, Ron Korving wrote: Maybe it'd be useful if there wa

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Ron Korving
Maybe it'd be useful if there was a function to "cast" a UTF string into a binary string without changing anything on the inside. That way one could do strlen(str_to_binary($string)). That would also be useful for binary storing and reading (with binary_to_str). Ron "Andrei Zmievski" <[EMAIL

Re: [PHP-DEV] Re: strlen() under unicode.semantics

2006-06-23 Thread Johannes Schlueter
Hi, in my opinion that name is bad since most of the time the string won't be stored using the internal encoding but stored using some implicit converted encoding like the encoding of the stream being used or the one from the database. So the size needed to store the string would most likley be