On 23.04.2013 08:53, Stefan Sperling wrote: > On Mon, Apr 22, 2013 at 01:13:43PM +0200, Branko Čibej wrote: >> On 22.04.2013 12:59, Bert Huijben wrote: >>> The assertion shows a design problem which we should handle for future >>> compatibility and you suggest just adding some bandages to patch/hide the >>> test failure? >>> >>> The current code is broken and the suggestion you do is like the solution >>> mostly vetoed by most of the responders in this thread: assuming there is >>> only us-english, by using a function that has platform specific behavior. >>> >>> (tolower() is locale and platform character encoding dependent. You should >>> never just pass individual UTF-8 bytes to it) > OK Bert, I can see how, for example, a tolower() implementation which runs > in a latin1 locale could convert parts of a UTF-8 string which contains > bytes that are part of a multibyte character, if such bytes happen to have > the same value as some upper case letter from the latin1 symbol range > [128-255].
You're missing the point. tolower() works on individual characters, not whole strings; so it in general /cannot/ do correct locale-specific lowercasing. What you're looking for is something like strcoll(), except that it should be case-insensitive. The long and short of it is that you need a complete locale-aware implementation of the Unicode standard to do proper case-insensitive comparisons; ICU is one such example. Trying to retrofit anything less smart onto apr_fnmatch will not work correctly. (N.B., the svn_utf__glob function on the wc-collate-path branch is explicitly case-sensitive, it only deals with UTF normalization differences.) We can of course document the restriction that 'log --search' will give unexpected results when log messages are anything but ASCII. I'd even accept that as a marginal band-aid provided we promis to (at least partially) fix it in 1.9. Now, we can't really make that promise, can we. :) -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com