--- Andrei Zmievski <[EMAIL PROTECTED]> wrote:
> 
> One of the ICU guys had the following suggestion, when I asked
> about case insensitive version of u_strstr() a while back:
> 
> > 1. Go one step further and make your string search
> > language-sensitive, using ICU's string search API (which is based
> > on collation). See http://icu.sourceforge.net/userguide/
> > searchString.html
> >
> > 2. Use ICU regular expressions. It currently does not handle case
> > foldings well that map a single character (like ?) to multiple
> > (like ss).
> >
> > 3. Look at the implementation behind functions like
> > u_strcasecmp() and try to adapt it to a string search. The
> > implementation case-folds both strings incrementally. For a
> > search, you would want to case-fold the pattern beforehand, but
> > not the text in which you are searching.
> >
> > 4. You might try the following: Take the first character in the
> > pattern and get the set of all characters that have the same case
> > folding (see the UnicodeSet/USet API). Then search in the string
> > for the occurrence of any one of the set items (which include
> > strings!). Then do a case-insensitive comparison, allowing a
match 
> > that does not end with the end of the text.
> >
> > The problematic cases are of course those ?->ss and similar. The
> > collation-based string search API has settings for whether you
> > want to find "sta" in "Flu?tal" and such (the pattern matches the
> > second half of a text character).
> >
> > Usually, users seem to be happy to do 1. or 2. Long-term, we
> > would like to beef up the regex implementation to handle more
> > complicated case foldings and also canonically equivalent (i.e.,
> > normalization) variants.
> 
> I am leaning towards 3 or 4. Not sure which one would be faster,
> but we definitely would want to write a generic function that does 
> case-insensitive search and re-use it in stristr(), stri_replace(),
> and others.

OK, I'll look at [3] in particular to try to define what the generic
funcn might look like. But I won't be able to work on it this week.

Thanks for the tips,
Rolland

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to