Ok, well then the code needs to use internationalized functions for string upper and lower. Operating on the first character of the string without surrounding context is incorrect. Operating on the string without locale is also incorrect.
The string operations should use ICU. Also, ICU uses boyer-moore I believe. (Or it did last time I looked.) Some other issues as well, but I will have to look at the code. I wasn't thinking utf-16, so you might also look at surrogates. Are there guidelines for php coding, and proper support for utf-16? > -----Original Message----- > From: Johannes Schlüter [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 11, 2008 5:32 AM > To: Texin, Tex > Cc: Scott MacVicar; Nuno Lopes; internals@lists.php.net; > Michal Dziemianko > Subject: RE: [PHP-DEV] Algorithm Optimizations - string search > > Hi, > > On Wed, 2008-06-11 at 01:01 -0700, Texin, Tex wrote: > > When I looked at the code, I assumed that it wasn't intended for > > international use I'll have to go back and look to give you > details, but it doesn't work for international use or unicode. > > It would be ok for 8859-1. > > That's the default case in PHP < 6, in current PHP versions > all string operations use on "binary" strings, so all > references to offset work on byte not character base. That's > one of the main reasons for PHP 6. > > johannes > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php