Michael Spencer wrote: > I think the two versions below each give the 'correct' output wrt to the OP's > single test case. I measure chunkerMS2 to be faster than chunkerGS2 across > all > chunk sizes, but this is all about the joins. > > I conclude that chunkerGS's deque beats chunkerMS's list for large chunk_size > (~ > >100). But for joined output, chunkerMS2 beats chunkerGS2 because it does > less > joining.
Although I speculate that the OP is really concerned about the chunking algorithm rather than an exact output format, chunkerGS2 can do better even when the chunks must be joined. Joining each chunk can (and should) be done only once, not every time the chunk is yielded. chunkerGS3 outperforms chunkerMS2 even more than the original versions: * chunk_size=3 chunkerGS3: 1.17 seconds chunkerMS2: 1.56 seconds * chunk_size=30 chunkerGS3: 1.26 seconds chunkerMS2: 6.35 seconds * chunk_size=300 chunkerGS3: 2.20 seconds chunkerMS2: 54.51 seconds def chunkerGS3(seq, sentry='.', chunk_size=3, keep_first=False, keep_last=False): iterchunks = itersplit(seq,sentry) buf = deque() join = ' '.join def append(chunk): chunk.append(sentry) buf.append(join(chunk)) for chunk in islice(iterchunks, chunk_size-1): append(chunk) if keep_first: yield join(buf) for chunk in iterchunks: append(chunk) yield join(buf) buf.popleft() if keep_last: while buf: yield join(buf) buf.popleft() > > if you're going to profile something, better use the > > standard timeit module > ... > OT: I will when timeit grows a capability for testing live objects rather than > 'small code snippets'. Requiring source code input and passing arguments by > string substitution makes it too painful for interactive work. The need to > specify the number of repeats is an additional annoyance. timeit is indeed somewhat cumbersome, but having a robust bug-free timing function is worth the inconvenience IMO. Best, George -- http://mail.python.org/mailman/listinfo/python-list