Τη Παρασκευή, 7 Ιουνίου 2013 11:53:04 π.μ. UTC+3, ο χρήστης Cameron Simpson 
έγραψε:
> On 07Jun2013 09:56, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= 
> <nikos.gr...@gmail.com> wrote:
> 
> | On 7/6/2013 4:01 πμ, Cameron Simpson wrote:
> 
> | >On 06Jun2013 11:46, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= 
> <nikos.gr...@gmail.com> wrote:
> 
> | >| Τη Πέμπτη, 6 Ιουνίου 2013 3:44:52 μ.μ. UTC+3, ο χρήστης Steven D'Aprano 
> έγραψε:
> 
> | >| > py> s = '999-Eυχή-του-Ιησού'
> 
> | >| > py> bytes_as_utf8 = s.encode('utf-8')
> 
> | >| > py> t = bytes_as_utf8.decode('iso-8859-7', errors='replace')
> 
> | >| > py> print(t)
> 
> | >| > 999-EΟΟΞ�-ΟΞΏΟ-ΞΞ·ΟΞΏΟ
> 
> | >|
> 
> | >| errors='replace' mean dont break in case or error?
> 
> | >
> 
> | >Yes. The result will be correct for correct iso-8859-7 and slightly mangled
> 
> | >for something that would not decode smoothly.
> 
> |
> 
> | How can it be correct? We have encoded out string in utf-8 and then
> 
> | we tried to decode it as greek-iso? How can this possibly be
> 
> | correct?

> If it is a valid iso-8859-7 sequence (which might cover everything, 
> since I expect it is an 8-bit 1:1 mapping from bytes values to a 
> set of codepoints, just like iso-8859-1) then it may decode to the 
> "wrong" characters, but the reverse process (characters encoded as
> bytes) should produce the original bytes.  With a mapping like this, 
> errors='replace' may mean nothing; there will be no errors because
> the only Unicode characters in play are all from iso-8859-7 to start
> with. Of course another string may not be safe. 

> Visually, the names will be garbage. And if you go:
>   mv '999-EΟΟΞ�-ΟΞΏΟ-ΞΞ·ΟΞΏΟ.mp3' '999-Eυχή-του-Ιησού.mp3'
> while using the iso-8859-7 locale, the wrong thing will occur
> (assuming it even works, though I think it should because all these
> characters are represented in iso-8859-7, yes?)

All the rest you i understood only the above quotes its still unclear to me.
I cant see to understand it.

Do you mean that utf-8, latin-iso, greek-iso and ASCII have the 1st 0-127 
codepoints similar?

For example char 'a' has the value of '65' for all of those character sets?
Is hat what you mean?

s = 'a'  (This is unicode right?  Why when we assign a string to a variable 
that string's type is always unicode and does not automatically become utf-8 
which includes all available world-wide characters? Unicode is something 
different that a character set? )

utf8_byte = s.encode('utf-8')

Now if we are to decode this back to utf8 we will receive the char 'a'.
I beleive same thing will happen with latin, greek, ascii isos. Correct?

utf8_a = utf8_byte.decode('iso-8859-7')
latin_a = utf8_byte.decode('iso-8859-1')
ascii_a = utf8_byte.decode('ascii')
utf8_a = utf8_byte.decode('iso-8859-7')

Is this correct? 
All of those decodes will work even if the encoded bytestring was of utf8 type?

The characters that will not decode correctly are those that their codepoints 
are greater that > 127 ?

for example if s = 'α' (greek character equivalent to english 'a')

Is this what you mean?
--------------------------------

Now back to my almost ready files.py script please:


#========================================================
# Collect filenames of the path dir as bytes
greek_filenames = os.listdir( b'/home/nikos/public_html/data/apps/' )

for filename in greek_filenames:
        # Compute 'path/to/filename' in bytes
        greek_path = b'/home/nikos/public_html/data/apps/' + b'filename'
        try:
                filepath = greek_path.decode('iso-8859-7')
                
                # Rename current filename from greek bytes --> utf-8 bytes
                os.rename( greek_path, filepath.encode('utf-8') )
        except UnicodeDecodeError:
                # Since its not a greek bytestring then its a proper utf8 
bytestring
                filepath = greek_path.decode('utf-8')


#========================================================
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )

# Load'em
for filename in filenames:
        try:
                # Check the presence of a file against the database and insert 
if it doesn't exist
                cur.execute('''SELECT url FROM files WHERE url = %s''', 
filename )
                data = cur.fetchone()
                
                if not data:
                        # First time for file; primary key is automatic, hit is 
defaulted 
                        cur.execute('''INSERT INTO files (url, host, lastvisit) 
VALUES (%s, %s, %s)''', (filename, host, lastvisit) )
        except pymysql.ProgrammingError as e:
                print( repr(e) )


#========================================================
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )
filepaths = ()

# Build a set of 'path/to/filename' based on the objects of path dir
for filename in filenames:
        filepaths.add( filename )

# Delete spurious 
cur.execute('''SELECT url FROM files''')
data = cur.fetchall()

# Check database's filenames against path's filenames
for rec in data:
        if rec not in filepaths:
                cur.execute('''DELETE FROM files WHERE url = %s''', rec )

=======================

ni...@superhost.gr [~/www/cgi-bin]# [Fri Jun 07 14:53:17 2013] [error] [client 
79.103.41.173] Error in sys.excepthook:
[Fri Jun 07 14:53:17 2013] [error] [client 79.103.41.173] ValueError: 
underlying buffer has been detached
[Fri Jun 07 14:53:17 2013] [error] [client 79.103.41.173]
[Fri Jun 07 14:53:17 2013] [error] [client 79.103.41.173] Original exception 
was:
[Fri Jun 07 14:53:17 2013] [error] [client 79.103.41.173] Traceback (most 
recent call last):
[Fri Jun 07 14:53:17 2013] [error] [client 79.103.41.173]   File 
"/home/nikos/public_html/cgi-bin/files.py", line 71, in <module>
[Fri Jun 07 14:53:17 2013] [error] [client 79.103.41.173]     os.rename( 
greek_path, filepath.encode('utf-8') )
[Fri Jun 07 14:53:17 2013] [error] [client 79.103.41.173] FileNotFoundError: 
[Errno 2] \\u0394\\u03b5\\u03bd 
\\u03c5\\u03c0\\u03ac\\u03c1\\u03c7\\u03b5\\u03b9 
\\u03c4\\u03ad\\u03c4\\u03bf\\u03b9\\u03bf 
\\u03b1\\u03c1\\u03c7\\u03b5\\u03af\\u03bf \\u03ae 
\\u03ba\\u03b1\\u03c4\\u03ac\\u03bb\\u03bf\\u03b3\\u03bf\\u03c2: 
'/home/nikos/public_html/data/apps/filename'


?????
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to