"cjl" wrote: > I am working on a little script that needs to pull the strings out of a > binary file, and then manipulate them with python. > > The command line utility "strings" (part of binutils) has exactly the > functionality I need, but I was thinking about trying to implement this > in pure python.
something like this could work: import re text = open(file, "rb").read() for m in re.finditer("([\x20-\x7f]{4,})[\n\0]", text): print m.start(), repr(m.group(1)) you may wish to modify the "[\x20-\x7f]" part to match your definition of "printable characters". "[-,.!?\w ]" is a reasonable choice in many cases... if the files can be huge, use the mmap module to map the file into memory, and run the RE on the mapped view. </F> -- http://mail.python.org/mailman/listinfo/python-list