On Sat, Nov 14, 2009 at 6:51 PM, Philip Semanchuk <phi...@semanchuk.com> wrote: > Hi Luca, > You have to define what you mean by "text" file. It might seem obvious, but > it's not. > > Do you mean just ASCII text? Or will you accept Unicode too? Unicode text > can be more difficult to detect because you have to guess the file's > encoding (unless it has a BOM; most don't). > > And do you need to verify that every single byte in the file is "text"? What > if the file is 1GB, do you still want to examine every single byte? > > If you give us your own (specific!) definition of what "text" means, or > perhaps a description of the problem you're trying to solve, then maybe we > can help you better. >
Thanks all. I was quite sure that this is not a very simple task. Right now search only inside ASCII encode is not enough for me (my native language is outside this encode :-) Checking every single byte can be a good solution... I can start using the mimetype module and, if the file has no extension, check byte one by one (commonly) as "file" command does. Better: I can check use the "file" command if available. Again: thanks all! -- -- luca -- http://mail.python.org/mailman/listinfo/python-list