On 2018-05-29 19:46:24 +1000, Chris Angelico wrote: > On Tue, May 29, 2018 at 6:15 PM, Peter J. Holzer <hjp-pyt...@hjp.at> wrote: > > So if the text is German it will contain more words with > > umlauts and each byte which is part of a correctly spelled German word > > when interpreted according to ISO-8859-1 increases the probability that > > decoding with ISO-8859-1 will produce the correct result. There remains > > a tiny probability that all those matches are mere coincidence, but I > > wrote "almost always", not "always", so I can live with an error rate of > > 0.000001% (or something like that). > > That's basically what the chardet module does, and its error rate is > far FAR higher than that. If you think it's easy to detect encodings, > I'm sure the chardet maintainers will be happy to accept pull > requests!
We were talking about humans, not programs. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
signature.asc
Description: PGP signature
-- https://mail.python.org/mailman/listinfo/python-list