Diez B. Roggisch wrote: > So cp1250 doesn't have all codepoints defined - but the others have. > Sure, this helps you to eliminate 1 of the three choices the OP wanted > to choose between - but how many texts you have that have a 129 in them?
For the iso8859 ones, you should assume that the characters in range(128, 160) really aren't used. If you get one of these, and it is not utf-8, it is a Windows code page. UTF-8 can be recognized pretty reliable: even though it allows all bytes to appear, it is very constraint in what sequences of bytes it allows. E.g. you can't have a single byte >127 in UTF-8; you need atleast two of them subsequent, and they need to meet more constraints. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list