I am trying to find somebody who can give me a simple python program I can use to "program by analogy". I just want to read two CSV files and match them on several fields, manipulate some of the fields, and write a couple of output files.
...
Please forgive me if this is so, and take pity on a stranger in a strange land.

Pittsburgh is a little strange, but not *that* bad :)

Just for fun, I threw together a simple (about 30 lines) program to do what you describe. Consider it a bit of slightly belated Christmas pity on the assumption that this isn't classwork (a little googling suggests that it's not homework). It's 100% untested, so if it formats your hard-drive, steals your spouse, wrecks your truck, kicks your dog, makes a mess of your trailer-home, and drinks all your beer, caveat coder. But you've got the source, so you can vet it...and it's even commented a bit for pedagogical amusement if you plan to mung with it :)

  from csv import reader
  SMALL = 'a.txt'
  OTHER = 'b.txt'
  smaller_file = {} # key->line mapping dict for the smaller file
  f_a = file(SMALL)
  r_a = reader(f_a)
  #a_headers = reader.next() # optionally discard a header row

  # build up the map in smaller_file of key->line
  for i, line in enumerate(r_a):
    a1, a2, a3, a4, a5 = line # name the fields
    key = f1, f3, f5
    if key in smaller_file:
      print "Duplicate key [%r] in %s:%i" % (key, SMALL, i+1)
      #continue # does the 1st or 2nd win? uncomment for 1st
    smaller_file[key] = line
  f_a.close()

  b = file(OTHER)
  r_b = reader(b)
  #b_headers = reader.next() # optionally discard a header row
  for i, line in enumerate(r_b):
    b1, b2, b3, b4, b5, b6, b7, b8, b9 = line
    key = b2, b8, b9
    if key not in smaller_file:
      print "Key for line #%i (%r) not in %s" % (i+1, key, SMALL)
      continue
    a1, a2, a3, a4, a5 = smaller_file[key]
    # do manipulation with a[1-5]/b[1-9] here
    # and do something with them
  b.close()

It makes more sense if instead of calling them a[1-5]/b[1-9], you actually use the field-names that may have be in the header rows such as

  cost_center, store, location, manager_id = line
  key = cost_center, store, location

You may also have to manipulate some of the values to make key-matches work, such as

  cc, store, loc, mgr = line
  cc = cc.strip().upper()
  store = store.strip().title()
  key = cc, store, loc

ensuring that you do the same manipulations for both files.

The code above reads the entire smaller file into memory and uses it for fast lookup. However, if you have gargantuan files, you may need to process them differently. You don't detail the fields/organization of the files, so if they're both sorted by key, you can change the algorithm to behave like the standard *nix "join" command.

Other asides: you may have to tweak treatment of a header-row (and correspondingly the line-numbers), as well as conflict-handling for keys in your a.txt source if they exist, along with the behavior when a key can't be found in a.txt but is requested in b.txt (maybe set some defaults instead of logging the error and skipping the row?), and then lastly and most importantly, you have to fill in the manipulations you desire and then actually do something with the processed results (write them to a file, upload them to a database, send them via email, output them to a text-to-speech engine and have it speak them, etc).

I come from 30 years of mainframe programming so I understand
how computers work at a bits/bytes /machine language/ source
vs.executable/reading core dumps level,  and I can program in
a lot of languages most people using Python have never even
heard of,

If there's such urgency, I hope you resorted to simply using one of these multitude of other languages you know -- Even in C, this wouldn't be too painful as projects go (there's a phrase you won't hear me utter frequently). Or maybe try your hand at it in pascal, shell-scripting (see the "join" command) or even assembly language. Not sure I'd use Logo, Haskel, Erlang, or Prolog. :)

My problem is that I want to do this all yesterday, and the
Python text I bought is not easy to understand. I don't have
time to work my way through the online Python tutorial.

As Rick mentioned, there are a number of free online sources for tutorials, books, and the like. Dive Into Python is one of the classics. Searching the archives of comp.lang.python for "beginner books" will yield the same thread coming up every couple weeks. For future reference, if you've got time-sensitive projects to tackle "yesterday", it's usually not the best time to try and learn a new language. Good luck in your exploration of Python.

-tkc




--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to