10.11.20 22:40, Dennis Lee Bieber пише: > Testing for extension in a list of exclusions would be much faster than > scanning the contents of a file, and the few that do get through would have > to be scanned anyway.
Then the simplest method should work: read the first 512 bytes and check if they contain b'\0'. Chance that a random sequences of bytes does not contain NUL is (1-1/256)**512 = 0.13. So this will filter out 87% of binary files. Likely6 more, because binary files usually have some structure, and reserve fixed size for integers. Most integers are much less than the maximal value, so higher bits and bytes are zeroes. You can also decrease the probability of false results by increasing the size of tested data or by testing few other byte values (b'\1', b'\2', etc). Anything more sophisticate is just a waste of your time. -- https://mail.python.org/mailman/listinfo/python-list