On Wed, Nov 11, 2020 at 5:36 AM Eli the Bearded <*@eli.users.panix.com> wrote: > Read first N lines of a file. If all parse as valid UTF-8, consider it text. > That's probably the rough method file(1) and Perl's -T use. (In > particular allow no nulls. Maybe allow ISO-8859-1.) >
ISO-8859-1 is basically "allow any byte values", so all you'd be doing is checking for a lack of NUL bytes. I'd definitely recommend mandating UTF-8, as that's a very good way of recognizing valid text, but if you can't do that then the simple NUL check is all you really need. And let's be honest here, there aren't THAT many binary files that manage to contain a total of zero NULs, so you won't get many false hits :) ChrisA -- https://mail.python.org/mailman/listinfo/python-list