Re: dummy needs help with Python

Tim Chase Sat, 27 Dec 2008 07:29:13 -0800

I am trying to find somebody who can give me a simple pythonprogram I can use to "program by analogy". I just want toread two CSV files and match them on several fields,manipulate some of the fields, and write a couple of outputfiles.

...

Please forgive me if this is so, and take pity on a strangerin a strange land.


Pittsburgh is a little strange, but not *that* bad :)

Just for fun, I threw together a simple (about 30 lines) programto do what you describe. Consider it a bit of slightly belatedChristmas pity on the assumption that this isn't classwork (alittle googling suggests that it's not homework). It's 100%untested, so if it formats your hard-drive, steals your spouse,wrecks your truck, kicks your dog, makes a mess of yourtrailer-home, and drinks all your beer, caveat coder. But you'vegot the source, so you can vet it...and it's even commented a bitfor pedagogical amusement if you plan to mung with it :)


  from csv import reader
  SMALL = 'a.txt'
  OTHER = 'b.txt'
  smaller_file = {} # key->line mapping dict for the smaller file
  f_a = file(SMALL)
  r_a = reader(f_a)
  #a_headers = reader.next() # optionally discard a header row

  # build up the map in smaller_file of key->line
  for i, line in enumerate(r_a):
    a1, a2, a3, a4, a5 = line # name the fields
    key = f1, f3, f5
    if key in smaller_file:
      print "Duplicate key [%r] in %s:%i" % (key, SMALL, i+1)
      #continue # does the 1st or 2nd win? uncomment for 1st
    smaller_file[key] = line
  f_a.close()

  b = file(OTHER)
  r_b = reader(b)
  #b_headers = reader.next() # optionally discard a header row
  for i, line in enumerate(r_b):
    b1, b2, b3, b4, b5, b6, b7, b8, b9 = line
    key = b2, b8, b9
    if key not in smaller_file:
      print "Key for line #%i (%r) not in %s" % (i+1, key, SMALL)
      continue
    a1, a2, a3, a4, a5 = smaller_file[key]
    # do manipulation with a[1-5]/b[1-9] here
    # and do something with them
  b.close()

It makes more sense if instead of calling them a[1-5]/b[1-9], youactually use the field-names that may have be in the header rowssuch as


  cost_center, store, location, manager_id = line
  key = cost_center, store, location

You may also have to manipulate some of the values to makekey-matches work, such as


  cc, store, loc, mgr = line
  cc = cc.strip().upper()
  store = store.strip().title()
  key = cc, store, loc

ensuring that you do the same manipulations for both files.

The code above reads the entire smaller file into memory and usesit for fast lookup. However, if you have gargantuan files, youmay need to process them differently. You don't detail thefields/organization of the files, so if they're both sorted bykey, you can change the algorithm to behave like the standard*nix "join" command.

Other asides: you may have to tweak treatment of a header-row(and correspondingly the line-numbers), as well asconflict-handling for keys in your a.txt source if they exist,along with the behavior when a key can't be found in a.txt but isrequested in b.txt (maybe set some defaults instead of loggingthe error and skipping the row?), and then lastly and mostimportantly, you have to fill in the manipulations you desire andthen actually do something with the processed results (write themto a file, upload them to a database, send them via email, outputthem to a text-to-speech engine and have it speak them, etc).

I come from 30 years of mainframe programming so I understand
how computers work at a bits/bytes /machine language/ source
vs.executable/reading core dumps level,  and I can program in
a lot of languages most people using Python have never even
heard of,

If there's such urgency, I hope you resorted to simply using oneof these multitude of other languages you know -- Even in C, thiswouldn't be too painful as projects go (there's a phrase youwon't hear me utter frequently). Or maybe try your hand at it inpascal, shell-scripting (see the "join" command) or even assemblylanguage. Not sure I'd use Logo, Haskel, Erlang, or Prolog. :)

My problem is that I want to do this all yesterday, and the
Python text I bought is not easy to understand. I don't have
time to work my way through the online Python tutorial.

As Rick mentioned, there are a number of free online sources fortutorials, books, and the like. Dive Into Python is one of theclassics. Searching the archives of comp.lang.python for"beginner books" will yield the same thread coming up everycouple weeks. For future reference, if you've got time-sensitiveprojects to tackle "yesterday", it's usually not the best time totry and learn a new language. Good luck in your exploration ofPython.


-tkc




--
http://mail.python.org/mailman/listinfo/python-list

Re: dummy needs help with Python

Reply via email to