Hi, Brane and Vincent,

From: Branko Čibej [mailto:br...@apache.org]
> >> A fairly plausible cause for getting the wrong representation is
> >> changing the locale for the duration of a script invocation. Another
> >> plausible way is to create files based on the contents of some
> >> script, which are not encoded the as expected by the current locale.
> > However Subversion doesn't handle that (BTW it would be much better to
> > remember the expected locale by storing it in the .svn directory
> > rather than giving obscure error messages: if it did, Subversion would
> > know that the user was using an incorrect locale without any
> > ambiguity).
> 
> And if the user changes the locale for valid reasons, the Subversion
> working copy would break in a different way.

I guess we would need some "change locale" operation, which would at least 
update the saved locale in the .svn directory.

(Updating the actual on-disk filenames could be left to the tools the user uses 
to also update his other filenames...)

> > Currently you can't avoid the problem: if the user has used UTF-8 then
> > runs Subversion under ISO-8859-1 locales, the "misconfiguration"
> > is not detected, and "svn up" can yield corrupt a working copy as
> > shown in the past. Subversion should remember the locale that was used
> > initially to avoid such a problem.
> 
> Well? This issue isn't limited to Subversion; most applications with fail
> at some point once you start playing games with the locale and/or filename
> encoding. That's why both Windows and OS X mandate one of the Unicode
> representations for filenames.

Python actually adopted a workaround to this problem called "surrogate 
escaping".
https://www.python.org/dev/peps/pep-0383/

This mechanism is applied to filenames and similar "byte strings" during 
communication with the outer world, with the limitation that their purpose is 
just to transfer the contents of the 8 bit string from one OS interface to the 
other, with only limited interpretation or processing of them.

Basically, they encapsulate invalid bytes (which cannot be successfully 
transformed to the internal Unicode representation) to a lonely surrogate, and 
decode it back to the original byte on the output side.

A solution like this could help SVN to deal with miscoded filenames, and would 
allow e. G. an "svn rm" or "svn mv" etc.

When adopting such a solution, it should be strictly restricted to local 
filenames (the RA layers should refuse them), and I guess we could get away 
with not even allowing them to enter the local working copy database.

For screen output, we could translate them to escape sequences like \x1A, so 
"svn status" could work...

However, I'm not sure whether it's worth the work to support basically broken 
environments, but on the other hand, the Python guys did go that way.

> You might as well say that Unix (Linux) is broken and should be fixed (with
> which I'd heartily agree, but that's water under the bridge).

All recent Linux installations I saw had UTF-8 as their encoding (independent 
of the language / country settings actually in use). And I don't see any valid 
reason to use anything else nowadays, except for keeping compatibility with 
existing installations...


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: 
http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this 
e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.

Reply via email to