On Jun 16, 2:17 pm, Dave Angel <da...@ieee.org> wrote: > Jorge wrote: > > Hi there, > > I'm making a application that reads 3 party generated ASCII files, but some > > times > > the files are corrupted totally or partiality and I need to know if it's a > > ASCII file with *nix line terminators. > > In linux I can run the file command but the applications should run in > > windows. > > > Any help will be great. > > > Thank you in advance. > > So, which is the assignment: > 1) determine if a file has non-ASCII characters > 2) determine whether the line-endings are crlf or just lf > > In the former case, look at translating the file contents to Unicode, > specifying ASCII as source. If it fails, you have non-ASCII > In the latter case, investigate the 'u' attribute of the mode parameter > in the open() function. > > You also need to ask yourself whether you're doing a validation of the > file, or doing a "best guess" like the file command.
>From your requisites, you're already assuming something that _should_ be ASCII, so it's easiest to check for ASCIIness at the binary level: Open the file as binary Loop at the bytes exit with error upon reading a byte outside the printable range (32-126 decimal) or any of a number of lower-range exceptions (\n, \t -- not \r since you want UNIX-style linefeeds) exit with success if the loop ended cleanly This supposes you're dealing strictly with ASCII, and not a full 8 bit codepage, of course. -- http://mail.python.org/mailman/listinfo/python-list