rbt <[EMAIL PROTECTED]> writes: > Now, management would like the IT guys to go thru the old data and > replace as many SSNs with the new ID numbers as possible. You have a > tab delimited txt file that maps the SSNs to the new ID numbers. There > are 500,000 of these number pairs. What is the most efficient way to > approach this? I have done small-scale find and replace programs > before, but the scale of this is larger than what I'm accustomed to.
Just use an ordinary python dict for the map, on a system with enough ram (maybe 100 bytes or so for each pair, so the map would be 50 MB). Then it's simply a matter of scanning through the input files to find SSN's and look them up in the dict. -- http://mail.python.org/mailman/listinfo/python-list