Thanks for your reply. Well, please drop a glance at my current profile report:
#------------------------ test.py --------------------- import os, sys, profile print os.uname() print sys.version # size of 'dict.txt' is about 3.6M, 154563 lines f = open('dict.txt', 'r') print "Reading lines..." lines = f.readlines() print "Done." def splitUsing(chars): def tmp(s): return s.split(chars) return tmp def sp0(lines): """====> sp0() -- Normal 'for' loop""" l = [] for line in lines: l.append(line.split('\t')) return l def sp1(lines): """====> sp1() -- List-comprehension""" return [s.split('\t') for s in lines] def sp2(lines): """====> sp2() -- Map with lambda function""" return map(lambda s: s.split('\t'), lines) def sp3(lines): """====> sp3() -- Map with splitUsing() function""" return map(splitUsing('\t'), lines) def sp4(lines): """====> sp4() -- Not correct, but very fast""" return map(str.split, lines) for num in xrange(5): fname = 'sp%(num)s' % locals() print eval(fname).__doc__ profile.run(fname+'(lines)') #---------------------------End of test.py ---------------- $ python test.py ('OpenBSD', 'Compaq', '3.9', 'kernel#1', 'i386') 2.4.2 (#1, Mar 2 2006, 14:17:22) [GCC 3.3.5 (propolice)] Reading lines... Done. ====> sp0() -- Normal 'for' loop 309130 function calls in 20.510 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 154563 4.160 0.000 4.160 0.000 :0(append) 1 0.010 0.010 0.010 0.010 :0(setprofile) 154563 6.490 0.000 6.490 0.000 :0(split) 1 0.380 0.380 20.500 20.500 <string>:1(?) 0 0.000 0.000 profile:0(profiler) 1 0.000 0.000 20.510 20.510 profile:0(sp0(lines)) 1 9.470 9.470 20.120 20.120 test.py:20(sp0) ====> sp1() -- List-comprehension 154567 function calls in 12.240 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 :0(setprofile) 154563 6.740 0.000 6.740 0.000 :0(split) 1 0.380 0.380 12.240 12.240 <string>:1(?) 0 0.000 0.000 profile:0(profiler) 1 0.000 0.000 12.240 12.240 profile:0(sp1(lines)) 1 5.120 5.120 11.860 11.860 test.py:27(sp1) ====> sp2() -- Map with lambda function 309131 function calls in 20.480 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 4.600 4.600 20.100 20.100 :0(map) 1 0.000 0.000 0.000 0.000 :0(setprofile) 154563 7.320 0.000 7.320 0.000 :0(split) 1 0.370 0.370 20.470 20.470 <string>:1(?) 0 0.000 0.000 profile:0(profiler) 1 0.010 0.010 20.480 20.480 profile:0(sp2(lines)) 1 0.000 0.000 20.100 20.100 test.py:31(sp2) 154563 8.180 0.000 15.500 0.000 test.py:33(<lambda>) ====> sp3() -- Map with splitUsing() function 309132 function calls in 21.900 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 5.540 5.540 21.520 21.520 :0(map) 1 0.000 0.000 0.000 0.000 :0(setprofile) 154563 7.100 0.000 7.100 0.000 :0(split) 1 0.380 0.380 21.900 21.900 <string>:1(?) 0 0.000 0.000 profile:0(profiler) 1 0.000 0.000 21.900 21.900 profile:0(sp3(lines)) 1 0.000 0.000 0.000 0.000 test.py:14(splitUsing) 154563 8.880 0.000 15.980 0.000 test.py:15(tmp) 1 0.000 0.000 21.520 21.520 test.py:35(sp3) ====> sp4() -- Not correct, but very fast 5 function calls in 3.090 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 2.660 2.660 2.660 2.660 :0(map) 1 0.000 0.000 0.000 0.000 :0(setprofile) 1 0.430 0.430 3.090 3.090 <string>:1(?) 0 0.000 0.000 profile:0(profiler) 1 0.000 0.000 3.090 3.090 profile:0(sp4(lines)) 1 0.000 0.000 2.660 2.660 test.py:39(sp4) The problem is the default behavior of str.split should be more complex than str.split('\t'). If we could use the str.split('\t') in map(), the result would be witty. What do u guys think? -- http://mail.python.org/mailman/listinfo/python-list