I have written a Python program that serach for specifik customer in
files (around 1000 files)
the trigger is LF01 + CUSTOMERNO

While most of the solutions folks have offered involve scanning all the files each time you search, if the content of those files doesn't change much, you can build an index once and then query the resulting index multiple times. Because I was bored, I threw together the code below (after the "-------" divider) which does what you detail as best I understand, allowing you to do

  python tkc.py 31415

to find the files containing CUSTOMERNO=31415 The first time, it's slow because it needs to create the index file. However, subsequent runs should be pretty speedy. You can also specify multiple customers on the command-line:

  python tkc.py 31415 1414 77777

and it will search for each of them. I presume they're found by the regexp "LF01(\d+)" based on your description, that the file can be sensibly broken into lines, and the code allows for multiple results on the same line. Adjust accordingly if that's not the pattern you want or the conditions you expect.

If your source files change, you can reinitialize the database with

  python tkc.py -i

You can also change the glob pattern used for indexing -- by default, I assumed they were "*.txt". But you can either override the default with

  python tkc.py -i -p "*.dat"

or you can change the source to default differently (or even skip the glob-check completely...look for the fnmatch() call). There are a few more options. Just use

  python tkc.py --help

as usual. It's also a simple demo of the optparse module if you've never used it.

Enjoy!

-tkc

PS: as an aside, how do I import just the fnmatch function? I tried both of the following and neither worked:

  from glob.fnmatch import fnmatch
  from glob import fnmatch.fnmatch

I finally resorted to the contortion coded below in favor of
  import glob
  fnmatch = glob.fnmatch.fnmatch

-----------------------------------------------------------------


#!/usr/bin/env python
import dbm
import os
import re
from glob import fnmatch
fnmatch = fnmatch.fnmatch
from optparse import OptionParser

customer_re = re.compile(r"LF01(\d+)")

def build_parser():
  parser = OptionParser(
    usage="%prog [options] [cust#1 [cust#2 ... ]]"
    )
  parser.add_option("-i", "--index", "--reindex",
    action="store_true",
    dest="reindex",
    default=False,
    help="Reindex files found in the current directory "
      "in the event any files have changed",
    )
  parser.add_option("-p", "--pattern",
    action="store",
    dest="pattern",
    default="*.txt",
    metavar="GLOB_PATTERN",
    help="Index files matching GLOB_PATTERN",
    )
  parser.add_option("-d", "--db", "--database",
    action="store",
    dest="indexfile",
    default=".index",
    metavar="FILE",
    help="Use the index stored at FILE",
    )
  parser.add_option("-v", "--verbose",
    action="count",
    dest="verbose",
    default=0,
    help="Increase verbosity"
    )
  return parser

def reindex(options, db):
  if options.verbose: print "Indexing..."
  for path, dirs, files in os.walk('.'):
    for fname in files:
      if fname == options.indexfile:
        # ignore our database file
        continue
      if not fnmatch(fname, options.pattern):
        # ensure that it matches our pattern
        continue
      fullname = os.path.join(path, fname)
      if options.verbose: print fullname
      f = file(fullname)
      found_so_far = set()
      for line in f:
        for customer_number in customer_re.findall(line):
          if customer_number in found_so_far: continue
          found_so_far.add(customer_number)
          try:
            val = '\n'.join([
              db[customer_number],
              fullname,
              ])
            if options.verbose > 1:
              print "Appending %s" % customer_number
          except KeyError:
            if options.verbose > 1:
              print "Creating %s" % customer_number
            val = fullname
          db[customer_number] = val
      f.close()

if __name__ == "__main__":
  parser = build_parser()
  opt, args = parser.parse_args()
  reindexed = False
  if opt.reindex or not os.path.exists("%s.db" % opt.indexfile):
    db = dbm.open(opt.indexfile, 'n')
    reindex(opt, db)
    reindexed = True
  else:
    db = dbm.open(opt.indexfile, 'r')
  if not (args or reindexed):
    parser.print_help()
  for arg in args:
    print "%s:" % arg,
    try:
      val = db[arg]
      print
      for item in val.splitlines():
        print " %s" % item
    except KeyError:
      print "Not found"
  db.close()


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to