409220000003 Life Fitness Products $1 (12-13-08) (CVS)
546500181141 Oust Air Sanitizer, any B1G1F up to $3.49 (1-17-09) .35
each
518000159258 Pillsbury Crescent Dinner Rolls, any .25 (2-14-09)
518000550406 Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40 (2-14-09)

into something like this:

"409220000003","Life Fitness Products $1","12-13-08"
"546500181141","Oust Air Sanitizer, any B1G1F up to $3.49","1-17-09"
"518000159258","Pillsbury Crescent Dinner Rolls, any .25","2-14-09"
"518000550406","Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
Cinnamon Rolls, etc. .40","2-14-09"

Any help, pseudo code, or whatever push in the right direction would
be most appreciated.  I am a novice Python programmer but I do have a
good bit of PHP programming experience.

A regexp should be able to split this fairly neatly:

  import re
  r = re.compile(r"^(\d+)\s+(.*)\((\d{1,2}-\d{1,2}-\d{2,4})\).*")
  out = file('out.csv', 'w')
  for i, line in enumerate(file('in.txt')):
    m = r.match(line)
    if not m:
      print "Line %i is malformed" % (i+1)
      continue
    out.write(','.join(
      '"%s"' % item.strip().replace('"', '""')
      for item in m.groups()
      ))
    out.write('\n')
  out.close()

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to