Michael Bacarella <[EMAIL PROTECTED]> writes: > id2name = {} > for line in iter(open('id2name.txt').readline,''): > id,name = line.strip().split(':') > id = long(id) > id2name[id] = name > > This takes about 45 *minutes* > > If I comment out the last line in the loop body it takes only about > 30 _seconds_ to run. This would seem to implicate the line > id2name[id] = name as being excruciatingly slow.
Or, rather, that the slowdown is caused by allocating these items in a dictionary at all. Dictionaries are implemented very efficiently in Python, but there will still be overhead in inserting millions of distinct items. Of course, if you just throw each item away instead of allocating space for it, the loop will run very quickly. > Is there a fast, functionally equivalent way of doing this? You could, instead of individual assignments in a 'for' loop, try letting the 'dict' type operate on a generator:: input_file = open("id2name.txt") id2name = dict( (long(id), name) for (id, name) in line.strip().split(":") for line in input_file ) All that code inside the 'dict()' call is a "generator expression"; if you don't know what they are yet, have a read of Python's documentation on them. It creates a generator which will spit out key+value tuples to be fed directly to the dict constructor as it requests them. That allows the generator to parse each item from the file exactly as the 'dict' constructor needs it, possibly saving some extra "allocate, assign, discard" steps. Not having your data set, I can't say if it'll be significantly faster. -- \ "Compulsory unification of opinion achieves only the unanimity | `\ of the graveyard." -- Justice Roberts in 319 U.S. 624 (1943) | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list