Hi John: Thanks for responding. >Look at your file using > print repr(open('c:/test/spanish.txt','rb').read())
>If you see 'a\xf1o' then use charset="windows-1252" I did this ... no change ... still see 'a\xf1o' >else if you see 'a\xc3\xb1o' then use charset="utf-8" else ???? >Based on your responses to Martin, it appears that your file is >actually windows-1252 but you are telling browsers that it is utf-8. >Another check: if the file is utf-8, then doing > open('c:/test/spanish.txt','rb').read().decode('utf8') >should be OK; if it's not valid utf8, it will complain. No. this causes decode error: UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-4: invalid data args = ('utf8', 'a\, 1, 5, 'invalid data') encoding = 'utf8' end = 5 object = 'a\xf1o' reason = 'invalid data' start = 1 >Yet another check: open the file with Notepad. Do File/SaveAs, and >look at the Encoding box -- ANSI or UTF-8? Notepad says it's ANSI Thanks. What now? Also, this is a general problem for me, whether I read from a file or read from an html text field, or read from an html text area. So I'm looking for a general solution. If it helps to debug by reading from textarea or text field, let me know. -- http://mail.python.org/mailman/listinfo/python-list