On Tue, Dec 10, 2013 at 1:07 PM, Petite Abeille <petite.abei...@gmail.com>wrote:
> > On Dec 10, 2013, at 6:25 AM, Dan Stromberg <drsali...@gmail.com> wrote: > > > The IMDB flat text file probably came the closest, but it appears to > have encoding issues; it's apparently nearly windows-1255, but not quite. > > It's ISO-8859-1. > Thanks - that reads well from CPython 3.3. Now the question becomes: Why did chardet tell me it was windows-1255? :) > Both certificates.list.gz and mpaa-ratings-reasons.list.gz are rather > straightforward to parse. > Sure, with an appropriate encoding. > For the US, you will get something along these lines out of > certificates.list.gz: > > USA:(Banned) > USA:12 > USA:AO > USA:Approved > USA:C > USA:E > USA:E10+ > USA:G > USA:GP > USA:K-A > USA:M > USA:M/PG > USA:NC-17 > USA:Not Rated > USA:Open > USA:PG > USA:PG-13 > USA:Passed > USA:R > USA:T > USA:TV-14 > USA:TV-G > USA:TV-MA > USA:TV-PG > USA:TV-Y > USA:TV-Y7 > USA:Unrated > USA:X > > And as mentioned, imdbpy handles all this out-of-the-box if you don’t feel > like doing it yourself. But I believe imdbpy is 2.7 only.
-- https://mail.python.org/mailman/listinfo/python-list