sorry, utf8 should be code point or utf16 character (two surrogates if needed). I thought I removed all references to utf8 before I sent it. Must have overlooked it. I was just trying to say that by character I mean the abstract character unit that Unicode defines, not character as user perceives it (grapheme) or anything else. So, yes one code point.
Padding left and right, should be the same as the current function. If it is symmetric, keep it symmetric. sorry to be terse. its difficult to write while on the road. (and apparently I have difficulties when not on the road... ;-) ) Tex Texin Internationalization Architect, Yahoo! Inc. > -----Original Message----- > From: Rolland Santimano [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 01, 2005 11:12 PM > To: Andrei Zmievski > Cc: [EMAIL PROTECTED]; internals@lists.php.net > Subject: Re: str_pad clarification - Re: [PHP-DEV] PHP > Unicode strings impl proposal > > > --- Andrei Zmievski <[EMAIL PROTECTED]> wrote: > > > > >> 4) The string can be truncated to the user's requested character > > >> length. The string will be trimmed from the right one > unicode utf-8 > > >> character (not grapheme, not byte) at a time until the > length > > >> limit is met. (So a combining character is one character for this > > >> purpose.) > > > > > > Shouldn't characters/codepoints be trimmed at both ends, > rather than > > > just at the right end ? > > > > Why would you trim it from the left? > > OK, my Q wasn't really clear. > > Assuming pad string == "abcdefg", the end result would be something > like: abcdefgXXXXabcdefg. But if the result string is being > trimmed because of length constraints, I understand that Tex > says the end result could be something like: abcdefgXXXXabc. > The current non-Unicode impl would return something like: > abcdeXXXXabcde - shouldn't this "symmetry" be retained in the > Unicode impl too ? > > Unless Tex is talking about just trimming the pad string, > rather than the result string, in which case, its alright. > > Another Q regarding Tex's proposal: why deal with UTF-8 > codepoints when trimming, when the inputs will be UTF-16 ? > > -- > Rolland > > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php