On Thu, 10 Mar 2016 01:54 am, Chris Angelico wrote: > I have a source of occasional text files that basically just dumps > stuff on me without any metadata, and I have to figure out (a) what > the encoding is, and (b) what language the text is in.
https://pypi.python.org/pypi/chardet > then I have two levels of heuristics to try to guess a > most-likely encoding I'm curious, what do you do? (I stress that trying to guess the character set or encoding from the text itself is a second-last ditch tactic, for when you really don't know and can't find out what the encoding is. The final, last-ditch tactic is to just say "bugger it, I'll pretend it's Latin-1" and get a mess of moji-bake, but at least an ASCII characters will decode alright, and as an English speaker, that's all that's important to me :-) -- Steven -- https://mail.python.org/mailman/listinfo/python-list