On Fri, 06 Sep 2013 02:11:56 -0700, wxjmfauth wrote: > Short comment about the "detection" tools from a previous discussion. > > The tools supposed to detect the coding scheme are all working with a > simple logical mathematical rule: > > p ==> q <==> non q ==> non p .
Incorrect. chardet does a statistical analysis of the bytes, and tries to guess what language they are likely to come from. The algorithm is described here: https://github.com/erikrose/chardet/blob/master/docs/how-it-works.html (although that's rather inconvenient to read), and here: http://www-archive.mozilla.org/projects/intl/ UniversalCharsetDetection.html chardet is a Python port of the Mozilla charset guesser, so they use the same algorithm. > Shortly -- and consequence -- they do not detect a coding scheme they > only detect "a" possible coding schme. That at least is correct. > The Flexible String Representation has conceptually to face the same > problem. No it doesn't. -- Steven -- https://mail.python.org/mailman/listinfo/python-list