rbt <[EMAIL PROTECTED]> wrote: > Grant Edwards wrote: > > On 2005-01-26, rbt <[EMAIL PROTECTED]> wrote: > > > >>Is there an easy way to exclude binary files (I'm working on > >>Windows XP) from the file list returned by os.walk()? > > > > Sure, assuming you can provide a rigorous definition of 'binary > > files'. :) > > non-ascii
The only way to tell for sure if a file contains only ASCII characters is to read the whole file and check. You _are_, however, using a very strange definition of "binary". A file of text in German, French or Italian, for example, is likely to be one you'll define as "binary" -- just as soon as it contains a vowel with accent or diaeresis, for example. On the other hand, you want to consider "non-binary" a file chock full of hardly-ever-used control characters, just because the American Standard Code for Information Interchange happened to standardize them once upon a time? Most people's intuitive sense of what "binary" means would rebel against both of these choices, I think; calling a file "binary" because its contents are, say, the string 'El perro de aguas espaņol.\n' (the n-with-tilde in "espaņol" disqualifies it from being ASCII), while another whose contents are 32 bytes all made up of 8 zero bits each (ASCII 'NUL' characters) is to be considered "non-binary". In any case, since you need to open and read all the files to check them for "being binary", either by your definition or whatever heuristics you might prefer, you would really not ``excluded them from os.walk'', but rather filter os.walk's results by these criteria. Alex -- http://mail.python.org/mailman/listinfo/python-list