dirknbr <dirknbr <at> gmail.com> writes: > I have kind of developped this but obviously it's not nice, any better > ideas? > > try: > text=texts[i] > text=text.encode('latin-1') > text=text.encode('utf-8') > except: > text=' '
As Steven has pointed out, if the .encode('latin-1') works, the result is thrown away. This would be very fortunate. It appears that your goal was to encode the text in latin1 if possible, otherwise in UTF-8, with no indication of which encoding was used. Your second posting confirmed that you were doing this in a loop, ending up with the possibility that your output file would have records with mixed encodings. Did you consider what a programmer writing code to READ your output file would need to do, e.g. attempt to decode each record as UTF-8 with a fall-back to latin1??? Did you consider what would be the result of sending a stream of mixed-encoding text to a display device? As already advised, the short answer to avoid all of that hassle; just encode in UTF-8. -- http://mail.python.org/mailman/listinfo/python-list