Jeremy Sanders wrote: > On Fri, 25 Feb 2005 17:14:24 +0100, Diez B. Roggisch wrote: > > > Maybe [c]StringIO can be of help. I don't know if it's iterator is lazy. But > > at least it has one, so you can try and see if it improves performance :) > > Excellent! I somehow missed that module. StringIO speeds up the iteration > by a factor of 20! >
Twenty?? StringIO.StringIO or cStringIO.StringIO??? I did some "timeit" tests using the code below, on 400,000 lines of 53 chars (uppercase + lowercase + '\n'). On my config (Python 2.4, Windows 2000, 1.4 GHz Athlon chip, not short of memory), cStringIO took 0.18 seconds and the "hard way" took 0.91 seconds. Stringio (not shown) took 2.9 seconds. FWIW, moving an attribute look-up in the (sfind = s.find) saves only about 0.1 seconds. >python -m timeit -s "import itersplitlines as i; d = i.mk_data(400000)" "i.test_csio(d)" 10 loops, best of 3: 1.82e+005 usec per loop >python -m timeit -s "import itersplitlines as i; d = i.mk_data(400000)" "i.test_gen(d)" 10 loops, best of 3: 9.06e+005 usec per loop A few questions: (1) What is your equivalent of the "hard way"? What [c]StringIO code did you use? (2) How did you measure the time? (3) How long does it take *compile* your 400,000-line Python script? !import cStringIO ! !def itersplitlines(s): ! if not s: ! yield s ! return ! pos = 0 ! sfind = s.find ! epos = len(s) ! while pos < epos: ! newpos = sfind('\n', pos) ! if newpos == -1: ! yield s[pos:] ! return ! yield s[pos:newpos+1] ! pos = newpos+1 ! !def test_gen(s): ! for z in itersplitlines(s): ! pass ! !def test_csio(s): ! for z in cStringIO.StringIO(s): ! pass ! !def mk_data(n): ! import string ! return (string.lowercase + string.uppercase + '\n') * n -- http://mail.python.org/mailman/listinfo/python-list