On 6 oct, 06:39, Greg <gregor.hochsch...@googlemail.com> wrote: > Brilliant! It worked. Thanks! > > Here is the final code for those who are struggling with similar > problems: > > ## open and decode file > # In this case, the encoding comes from the charset argument in a meta > tag > # e.g. <meta charset="iso-8859-2"> > fileObj = open(filePath,"r").read() > fileContent = fileObj.decode("iso-8859-2") > fileSoup = BeautifulSoup(fileContent) > > ## Do some BeautifulSoup magic and preserve unicode, presume result is > saved in 'text' ## > > ## write extracted text to file > f = open(outFilePath, 'w') > f.write(text.encode('utf-8')) > f.close() >
or (Python2/Python3) >>> import io >>> with io.open('abc.txt', 'r', encoding='iso-8859-2') as f: ... r = f.read() ... >>> repr(r) u'a\nb\nc\n' >>> with io.open('def.txt', 'w', encoding='utf-8-sig') as f: ... t = f.write(r) ... >>> f.closed True jmf -- http://mail.python.org/mailman/listinfo/python-list