Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

Peter J. Holzer Tue, 29 May 2018 03:12:42 -0700

On 2018-05-29 19:46:24 +1000, Chris Angelico wrote:
> On Tue, May 29, 2018 at 6:15 PM, Peter J. Holzer <[email protected]> wrote:
> > So if the text is German it will contain more words with
> > umlauts and each byte which is part of a correctly spelled German word
> > when interpreted according to ISO-8859-1 increases the probability that
> > decoding with ISO-8859-1 will produce the correct result. There remains
> > a tiny probability that all those matches are mere coincidence, but I
> > wrote "almost always", not "always", so I can live with an error rate of
> > 0.000001% (or something like that).
> 
> That's basically what the chardet module does, and its error rate is
> far FAR higher than that. If you think it's easy to detect encodings,
> I'm sure the chardet maintainers will be happy to accept pull
> requests!


We were talking about humans, not programs.

        hp

-- 
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | [email protected]         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

signature.asc
Description: PGP signature

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

Reply via email to