I have written a Python program that serach for specifik customer in
files (around 1000 files)
the trigger is LF01 + CUSTOMERNO
While most of the solutions folks have offered involve scanning
all the files each time you search, if the content of those files
doesn't change much, you can build an index once and then query
the resulting index multiple times. Because I was bored, I threw
together the code below (after the "-------" divider) which does
what you detail as best I understand, allowing you to do
python tkc.py 31415
to find the files containing CUSTOMERNO=31415 The first time,
it's slow because it needs to create the index file. However,
subsequent runs should be pretty speedy. You can also specify
multiple customers on the command-line:
python tkc.py 31415 1414 77777
and it will search for each of them. I presume they're found by
the regexp "LF01(\d+)" based on your description, that the file
can be sensibly broken into lines, and the code allows for
multiple results on the same line. Adjust accordingly if that's
not the pattern you want or the conditions you expect.
If your source files change, you can reinitialize the database with
python tkc.py -i
You can also change the glob pattern used for indexing -- by
default, I assumed they were "*.txt". But you can either
override the default with
python tkc.py -i -p "*.dat"
or you can change the source to default differently (or even skip
the glob-check completely...look for the fnmatch() call). There
are a few more options. Just use
python tkc.py --help
as usual. It's also a simple demo of the optparse module if
you've never used it.
Enjoy!
-tkc
PS: as an aside, how do I import just the fnmatch function? I
tried both of the following and neither worked:
from glob.fnmatch import fnmatch
from glob import fnmatch.fnmatch
I finally resorted to the contortion coded below in favor of
import glob
fnmatch = glob.fnmatch.fnmatch
-----------------------------------------------------------------
#!/usr/bin/env python
import dbm
import os
import re
from glob import fnmatch
fnmatch = fnmatch.fnmatch
from optparse import OptionParser
customer_re = re.compile(r"LF01(\d+)")
def build_parser():
parser = OptionParser(
usage="%prog [options] [cust#1 [cust#2 ... ]]"
)
parser.add_option("-i", "--index", "--reindex",
action="store_true",
dest="reindex",
default=False,
help="Reindex files found in the current directory "
"in the event any files have changed",
)
parser.add_option("-p", "--pattern",
action="store",
dest="pattern",
default="*.txt",
metavar="GLOB_PATTERN",
help="Index files matching GLOB_PATTERN",
)
parser.add_option("-d", "--db", "--database",
action="store",
dest="indexfile",
default=".index",
metavar="FILE",
help="Use the index stored at FILE",
)
parser.add_option("-v", "--verbose",
action="count",
dest="verbose",
default=0,
help="Increase verbosity"
)
return parser
def reindex(options, db):
if options.verbose: print "Indexing..."
for path, dirs, files in os.walk('.'):
for fname in files:
if fname == options.indexfile:
# ignore our database file
continue
if not fnmatch(fname, options.pattern):
# ensure that it matches our pattern
continue
fullname = os.path.join(path, fname)
if options.verbose: print fullname
f = file(fullname)
found_so_far = set()
for line in f:
for customer_number in customer_re.findall(line):
if customer_number in found_so_far: continue
found_so_far.add(customer_number)
try:
val = '\n'.join([
db[customer_number],
fullname,
])
if options.verbose > 1:
print "Appending %s" % customer_number
except KeyError:
if options.verbose > 1:
print "Creating %s" % customer_number
val = fullname
db[customer_number] = val
f.close()
if __name__ == "__main__":
parser = build_parser()
opt, args = parser.parse_args()
reindexed = False
if opt.reindex or not os.path.exists("%s.db" % opt.indexfile):
db = dbm.open(opt.indexfile, 'n')
reindex(opt, db)
reindexed = True
else:
db = dbm.open(opt.indexfile, 'r')
if not (args or reindexed):
parser.print_help()
for arg in args:
print "%s:" % arg,
try:
val = db[arg]
print
for item in val.splitlines():
print " %s" % item
except KeyError:
print "Not found"
db.close()
--
http://mail.python.org/mailman/listinfo/python-list