On 23.04.2013 08:53, Stefan Sperling wrote:
> On Mon, Apr 22, 2013 at 01:13:43PM +0200, Branko Čibej wrote:
>> On 22.04.2013 12:59, Bert Huijben wrote:
>>> The assertion shows a design problem which we should handle for future 
>>> compatibility and you suggest just adding some bandages to patch/hide the 
>>> test failure?
>>>
>>> The current code is broken and the suggestion you do is like the solution 
>>> mostly vetoed by most of the responders in this thread: assuming there is 
>>> only us-english, by using a function that has platform specific behavior.
>>>
>>> (tolower() is locale and platform character encoding dependent. You should 
>>> never just pass individual UTF-8 bytes to it)
> OK Bert, I can see how, for example, a tolower() implementation which runs
> in a latin1 locale could convert parts of a UTF-8 string which contains
> bytes that are part of a multibyte character, if such bytes happen to have
> the same value as some upper case letter from the latin1 symbol range 
> [128-255].

You're missing the point. tolower() works on individual characters, not
whole strings; so it in general /cannot/ do correct locale-specific
lowercasing. What you're looking for is something like strcoll(), except
that it should be case-insensitive.

The long and short of it is that you need a complete locale-aware
implementation of the Unicode standard to do proper case-insensitive
comparisons; ICU is one such example. Trying to retrofit anything less
smart onto apr_fnmatch will not work correctly.

(N.B., the svn_utf__glob function on the wc-collate-path branch is
explicitly case-sensitive, it only deals with UTF normalization
differences.)


We can of course document the restriction that 'log --search' will give
unexpected results when log messages are anything but ASCII. I'd even
accept that as a marginal band-aid provided we promis to (at least
partially) fix it in 1.9. Now, we can't really make that promise, can we. :)

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com

Reply via email to