--- Andrei Zmievski <[EMAIL PROTECTED]> wrote: > > One of the ICU guys had the following suggestion, when I asked > about case insensitive version of u_strstr() a while back: > > > 1. Go one step further and make your string search > > language-sensitive, using ICU's string search API (which is based > > on collation). See http://icu.sourceforge.net/userguide/ > > searchString.html > > > > 2. Use ICU regular expressions. It currently does not handle case > > foldings well that map a single character (like ?) to multiple > > (like ss). > > > > 3. Look at the implementation behind functions like > > u_strcasecmp() and try to adapt it to a string search. The > > implementation case-folds both strings incrementally. For a > > search, you would want to case-fold the pattern beforehand, but > > not the text in which you are searching. > > > > 4. You might try the following: Take the first character in the > > pattern and get the set of all characters that have the same case > > folding (see the UnicodeSet/USet API). Then search in the string > > for the occurrence of any one of the set items (which include > > strings!). Then do a case-insensitive comparison, allowing a match > > that does not end with the end of the text. > > > > The problematic cases are of course those ?->ss and similar. The > > collation-based string search API has settings for whether you > > want to find "sta" in "Flu?tal" and such (the pattern matches the > > second half of a text character). > > > > Usually, users seem to be happy to do 1. or 2. Long-term, we > > would like to beef up the regex implementation to handle more > > complicated case foldings and also canonically equivalent (i.e., > > normalization) variants. > > I am leaning towards 3 or 4. Not sure which one would be faster, > but we definitely would want to write a generic function that does > case-insensitive search and re-use it in stristr(), stri_replace(), > and others.
OK, I'll look at [3] in particular to try to define what the generic funcn might look like. But I won't be able to work on it this week. Thanks for the tips, Rolland -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php