The id2name.txt file is an index of primary keys to strings. They look like this:
11293102971459182412:Descriptive unique name for this record\n 950918240981208142:Another name for another record\n The file's properties are: # wc -l id2name.txt 8191180 id2name.txt # du -h id2name.txt 517M id2name.txt I'm loading the file into memory with code like this: id2name = {} for line in iter(open('id2name.txt').readline,''): id,name = line.strip().split(':') id = long(id) id2name[id] = name This takes about 45 *minutes* If I comment out the last line in the loop body it takes only about 30 _seconds_ to run. This would seem to implicate the line id2name[id] = name as being excruciatingly slow. Is there a fast, functionally equivalent way of doing this? (Yes, I really do need this cached. No, an RDBMS or disk-based hash is not fast enough.) -- http://mail.python.org/mailman/listinfo/python-list