I am impressed at how helpful you have been on this Jonathon. It does say in the mht file that it is windows-1252 encoded.
It turns out that s.decode('cp1252').encode('utf-8') is working correctly. I mistakenly thought it was not because I got this error UnicodeEncodeError: 'charmap' codec can't encode character u'\u2018' in position 193: character maps to <undefined> This was from a print statement. It turns out you get this error when trying to print the left single quotation mark that is correcty coded in unicode. So this is why I was having such problems. This error is presumably because the print statement is working in dos mode, this character is not in the dos character set. So using print to check out what is going on in python is not a good idea when using unicode. Obvious with hindsight, not so obvious without hindsight. Thanks again Jonathon for all your support on this. Peter -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.