On 23.04.2013 14:51, Stefan Sperling wrote: > On Tue, Apr 23, 2013 at 02:27:08PM +0200, Branko Čibej wrote: >> You're missing the point. tolower() works on individual characters, not >> whole strings; so it in general /cannot/ do correct locale-specific > Do you really mean characters, or bytes? > It sounds like you mean bytes. tolower() works on individual bytes.
It *does not matter* whether it's bytes or characters, it still cannot do correct local-specific lowercasing. [...] >> Trying to retrofit anything less >> smart onto apr_fnmatch will not work correctly. > That depends on whether an fnmatch implementation is willing to live > with the limitations of the locale mechanism (one opaque charset > supported, any charset not in the current locale can give errors). For the case we're considering, we don't care about conversion from UTF-8, since we require log messages to be in UTF-8 anyway. > It seems that some people do think fnmatch() should do it this way: > http://opensource.apple.com/source/Libc/Libc-583/gen/FreeBSD/fnmatch.c > (Caution: This implementation has the out-of-bounds recursion bug > which made Bill rewrite fnmatch for APR...) > > Subversion already assumes it can convert strings from UTF-8 to the > locale's character set for output. We could also assume that we can > convert log messages from UTF-8 to the current locale charset, and > write something that performs case-insensitive matching with wchar_t. > However, that's clearly out of scope for 1.8 as well :) Let me say again: comparing single characters is not correct case folding. German is a good example of why that doesn't work: it does not just have the ß == SS equivalence; for case-insensitive search, I'd also expect ö == OE/oe == Ö etc. to be equivalent. If you consider all this, the easiest approach by far might be to simply add a Lucene index of all log messages to the server, then you can and any number of bells and whistles including language-specific stemming. I'd consider that a better solution then any homegrown full-text search facility; these are never easy. -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com