On 7/6/2013 4:01 πμ, Cameron Simpson wrote:
On 06Jun2013 11:46, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?=
<nikos.gr...@gmail.com> wrote:
| Τη Πέμπτη, 6 Ιουνίου 2013 3:44:52 μ.μ. UTC+3, ο χρήστης Steven D'Aprano
έγραψε:
| > py> s = '999-Eυχή-του-Ιησού'
| > py> bytes_as_utf8 = s.encode('utf-8')
| > py> t = bytes_as_utf8.decode('iso-8859-7', errors='replace')
| > py> print(t)
| > 999-EΟΟΞ�-ΟΞΏΟ-ΞΞ·ΟΞΏΟ
|
| errors='replace' mean dont break in case or error?
Yes. The result will be correct for correct iso-8859-7 and slightly mangled
for something that would not decode smoothly.
How can it be correct? We have encoded out string in utf-8 and then we
tried to decode it as greek-iso? How can this possibly be correct?
| You took the unicode 's' string you utf-8 bytestringed it.
| Then how its possible to ask for the utf8-bytestring to decode
| back to unicode string with the use of a different charset that the
| one used for encoding and thsi actually printed the filename in
| greek-iso?
It is easily possible, as shown above. Does it make sense? Normally
not, but Steven is demonstrating how your "mv" exercises have
behaved: a rename using utf-8, then a _display_ using iso-8859-7.
Same as above, i don't understand it at all, since different
charsets(encodings) used in the encode/decode process.
|
| a) WHAT does it mean when a linux system is set to use utf-8?
It means the locale settings _for the current process_ are set for
UTF-8. The "locale" command will show you the current state.
That means that, when a linux application needs to saved a filename to
the linux filesystem, the app checks the filesytem's 'locale', so to
encode the filename using the utf-8 charset ?
And likewise when a linux application wants to decode a filename is also
checking the filesystem's 'locale' setting so to know what charset must
use to decode the filename correctly back to the original string?
So locale is used for filesystem itself and linux apps to know how to
read(decode) and write(enode) filenames from/into the system's hdd?
| c) WHAT happens when the two of them try to work together?
If everything matches, it is all good. If the locales do not match,
the mismatch will result in an undesired bytes<->characters
encode/decode step somewhere, and something will display incorrectly
or be entered as input incorrectly.
Cant quite grasp the idea:
local end: Win8, locale = greek-iso
remote end: CentOS 6.4, locale = utf-8
FileZilla by default uses "do not know what charset" to upload filenames
Putty by default uses greek-iso to display filenames
WHAT someone can expect to happen when all of the above work together?
Mess of course, but i want to hear in detail each step of the mess as it
emerges.
--
Webhost <http://superhost.gr>&& Weblog <http://psariastonafro.wordpress.com>
--
http://mail.python.org/mailman/listinfo/python-list