Re: Unicode filenames

DL Neil via Python-list Fri, 06 Dec 2019 11:44:28 -0800

On 7/12/19 7:17 AM, Bob van der Poel wrote:

I have some files which came off the net with, I'm assuming, unicode
characters in the names. I have a very short program which takes the
filename and puts into an emacs buffer, and then lets me add information to
that new file (it's a poor man's DB).


Next, I can look up text in the file and open the saved filename.
Everything works great until I hit those darn unicode filenames.

Just to confuse me even more, the error seems to be coming from a bit of
tkinter code:
  if sresults.has_key(textAtCursor):
         bookname = os.path.expanduser(sresults[textAtCursor].strip())

which generates

   UnicodeWarning: Unicode equal comparison failed to convert both arguments
to Unicode - interpreting them as being unequal  if
sresults.has_key(textAtCursor):

I really don't understand the business about "both arguments". Not sure how
to proceed here. Hoping for a guideline!

(I'm guessing that) the "both arguments" relates to expanduser() becausethis is the first time that the fileNM has been identified to Python asanything more than a string of characters.

[a fileNM will be a string of characters, but a string of characters isnot necessarily a (legal) fileNM!]

Further suggesting, that if you are using Python3 (cf 2), your analysismay be the wrong-way-around. Python3 treats strings as Unicode. However,there is, and certainly in the past, was, no requirement for OpSys andIOCS to encode in Unicode.

The problem (for me) came from MSFT's (for example) many variations ofISO-8859-n and that there are no clues as to which of these was used innaming the file, and thus many possibly 'translations' into Unicode.

You can start to address the issue by using Python's bytes (instead ofstrings), however that cold reality still intrudes.

Do you know the provenance of these files, eg they are in French andfrom an MS-Win machine? If so, you may be able to use decode() andencode(), but...

Still looking for trouble? Knowing a fileNM was in Spanish/Portuguese Iwas able to take the fileNM's individual Unicode characters/surrogatesand subtract an applicable constant, so that accented letters fell'back' into the correct Unicode range. (this is extremely risky, andcould quite easily make matters worse/more confusing).

I warn you that pursuing this matter involves disappearing down into avery deep 'rabbit hole', but YMMV!


WebRefs:
https://docs.python.org/3/howto/unicode.html
https://www.dictionary.com/e/slang/rabbit-hole/
--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Re: Unicode filenames

Reply via email to