Re: Movie (MPAA) ratings and Python?

Ned Batchelder Wed, 11 Dec 2013 17:03:49 -0800

On 12/11/13 6:39 PM, Dan Stromberg wrote:


On Wed, Dec 11, 2013 at 3:24 PM, Steven D'Aprano
<steve+comp.lang.pyt...@pearwood.info
<mailto:steve+comp.lang.pyt...@pearwood.info>> wrote:

    On Wed, 11 Dec 2013 15:07:35 -0800, Dan Stromberg wrote:

     >  $ chardet mpaa-ratings-reasons.list
     > mpaa-ratings-reasons.list: windows-1255 (confidence: 0.97)
     >
     > I'm aware that chardet is playing guessing games, though one
    would hope
     > it would guess well most of the time, and give a reasonable
    confidence
     > rating.

    What reason do you have for thinking that Windows-1255 isn't a
    reasonable
    guess? If the bulk of the text is Latin-1 except perhaps for one or two
    Hebrew characters (or what chardet thinks are Hebrew characters), it may
    actually be a reasonable guess.


I get a traceback if I try to read the file as Windows-1255.  I don't
get a traceback if I read it as ISO-8859-1.

    If it is a poor guess, perhaps you ought to report it to the chardet
    maintainers as a good example of a poor guess.

I was considering that, and may do so.

I've also been wondering if ISO-8859-1 is just an octet-oriented codec,
so it'll read about anything.  There are clearly non-7-bit-ASCII
characters in the file that look like line noise in an mrxvt.

Both ISO-8859-1 and Windows-1255 are octet-oriented, I don't see why onewould raise an exception when the other didn't. Unless the exceptionisn't on the decode, but instead on your attempt to output the result.Can you show the full traceback you're seeing?


--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list

Re: Movie (MPAA) ratings and Python?

Reply via email to