On Dec 10, 2013, at 6:25 AM, Dan Stromberg <drsali...@gmail.com> wrote:
> The IMDB flat text file probably came the closest, but it appears to have > encoding issues; it's apparently nearly windows-1255, but not quite. It's ISO-8859-1. Both certificates.list.gz and mpaa-ratings-reasons.list.gz are rather straightforward to parse. For the US, you will get something along these lines out of certificates.list.gz: USA:(Banned) USA:12 USA:AO USA:Approved USA:C USA:E USA:E10+ USA:G USA:GP USA:K-A USA:M USA:M/PG USA:NC-17 USA:Not Rated USA:Open USA:PG USA:PG-13 USA:Passed USA:R USA:T USA:TV-14 USA:TV-G USA:TV-MA USA:TV-PG USA:TV-Y USA:TV-Y7 USA:Unrated USA:X And as mentioned, imdbpy handles all this out-of-the-box if you don’t feel like doing it yourself. -- https://mail.python.org/mailman/listinfo/python-list