George Sakkis <[EMAIL PROTECTED]> wrote: ... > > Unless each line is huge, how exactly you split it to get the first and > > last blank-separated word is not going to matter much. > > > > Still, you should at least avoid repeating the splitting twice, that's > > pretty obviously sheer waste: so, change that loop body to: > > > > words = line.split(' ') > > db[words[0]] = words[-1] > > > > If some lines are huge, splitting them entirely may be far more work > > than you need. In this case, you may do two partial splits instead, one > > direct and one reverse: > > > > first_word = line.split(' ', 1)[0] > > last_word = line.rsplit(' ', 1][-1] > > db[first_word] = last_word > > I'd guess the following is in theory faster, though it might not make > a measurable difference: > > first_word = line[:line.index(' ')] > last_word = line[line.rindex(' ')+1:] > db[first_word] = last_word
If the lines are huge, the difference is quite measurable: brain:~ alex$ python -mtimeit -s"line='ciao '*999" "first=line.split(' ',1)[0]; line=line.rstrip(); second=line.rsplit(' ',1)[-1]" 100000 loops, best of 3: 3.95 usec per loop brain:~ alex$ python -mtimeit -s"line='ciao '*999" "first=line[:line.index(' ')]; line=line.rstrip(); second=line[line.rindex(' ')+1:]" 1000000 loops, best of 3: 1.62 usec per loop brain:~ alex$ So, if the 4GB file was made up, say, of 859853 such lines, using the index/rindex approach might save a couple of seconds overall. The lack of ,1 in the split/rsplit calls (i.e., essentially, the code originally posted) brings the snippet time to 226 microseconds; here, the speedup might therefore be of a couple HUNDRED seconds in all. Alex -- http://mail.python.org/mailman/listinfo/python-list