Re: Help needed with python unicode cgi-bin script

weheh Tue, 11 Dec 2007 09:52:46 -0800

Hi John:
Thanks for responding.

>Look at your file using
 >   print repr(open('c:/test/spanish.txt','rb').read())


>If you see 'a\xf1o' then use charset="windows-1252"
I did this ... no change ... still see 'a\xf1o'

>else if you see 'a\xc3\xb1o' then use charset="utf-8" else ????

>Based on your responses to Martin, it appears that your file is
>actually windows-1252 but you are telling browsers that it is utf-8.

>Another check: if the file is utf-8, then doing
 >   open('c:/test/spanish.txt','rb').read().decode('utf8')
>should be OK; if it's not valid utf8, it will complain.
No. this causes decode error:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-4: invalid 
data
      args = ('utf8', 'a\, 1, 5, 'invalid data')
      encoding = 'utf8'
      end = 5
      object = 'a\xf1o'
      reason = 'invalid data'
      start = 1


>Yet another check: open the file with Notepad. Do File/SaveAs, and
>look at the Encoding box -- ANSI or UTF-8?
Notepad says it's ANSI

Thanks. What now? Also, this is a general problem for me, whether I read 
from a file or read from an html text field, or read from an html text area. 
So I'm looking for a general solution. If it helps to debug by reading from 
textarea or text field, let me know. 


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help needed with python unicode cgi-bin script

Reply via email to