Steven D'Aprano <steve+comp.lang.python <at> pearwood.info> writes:
> > > 1) Yield a shorter chunk > > 2) Extend the chunk with fill values > > 3) Raise an error > > 4) Ignore the last chunk > > > > Cases 2 and 4 can be achieved with current itertools primitives e.g.: 2) > > izip_longest(fillvalue=fillvalue, *[iter(iterable)] * n) 4) > > zip(*[iter(iterable)] * n) > > > > However I have only ever had use cases for 1 and 3 and these are not > > currently possible without something additional (e.g. a generator > > function). > > All of these are trivial. Start with the grouper recipe from the itertools > documentation, which is your case 2) above, renaming if desired: > > http://docs.python.org/2/library/itertools.html#recipes > > def chunk_pad(n, iterable, fillvalue=None): > args = [iter(iterable)] * n > return izip_longest(fillvalue=fillvalue, *args) > > Now define: > > def chunk_short(n, iterable): # Case 1) above > sentinel = object() > for chunk in chunk_pad(n, iterable, fillvalue=sentinel): > if sentinel not in chunk: > yield chunk > else: > i = chunk.index(sentinel) > yield chunk[:i] > > def chunk_strict(n, iterable): # Case 3) above > sentinel = object() > for chunk in chunk_pad(n, iterable, fillvalue=sentinel): > if sentinel in chunk: > raise ValueError > yield chunk > These are only trivial on the surface. I brought up this topic on python-ideas just weeks ago and it turns out there's a surprising numbers of alternate solutions that people use for these two cases. Yours is straightforward and simple, but comes at the price of the if sentinel clause being checked repeatedly. An optimized version suggested by Peter Otten replaces your for loop by: chunk_pad = zip_longest(*args, fillvalue=fillvalue) prev = next(chunks) for chunk in chunk_pad: yield prev prev = chunk then doing the sentinel test only once at the end. >> 1) Yield a shorter chunk >> 2) Extend the chunk with fill values >> 3) Raise an error >> 4) Ignore the last chunk >> >> Cases 2 and 4 can be achieved with current itertools primitives e.g.: 2) >> izip_longest(fillvalue=fillvalue, *[iter(iterable)] * n) 4) >> zip(*[iter(iterable)] * n) In my opinion, it would make sense to have the 4 cases suggested by Oscar covered by itertools. As he says, cases 2 and 4 are already (and there is the grouper recipe in itertools giving the solution for case 2). It would prevent people from reinventing (often suboptimal) solutions to these common problems and it would bring a speed-gain even compared to the best Python implementations since things would be coded in C. I would advocate for either of the following two solutions: a) have an extra 'mode'-type argument for zip_longest() to control its behavior (default mode could be the current fillvalue padding, 'strict' mode would raise an error, and 'relaxed' mode would yield the shorter chunk. or b) have extra zip_strict and zip_relaxed (I'm also not too good at thinking up names :)) functions in itertools Either way, you could now very easily modify the existing grouper recipe in itertools to implement the four different 'chunks' functions (I would keep calling them 'grouper' functions in line with the current itertools version). > I think the real reasons it's not in the standard library are: > > - there's no consensus on what chunking should do; > > - hence whatever gets added will disappoint some people; > > - unless you add "all of them", in which case you've now got a > significantly harder API ("there are five chunk functions in itertools, > which should I use?"); > > What does chunking (grouping) even mean? Given: > > chunk("abcdef", 3) > > should I get this? [abc, def] > > or this? [abc, bcd, cde, def] I guess this suggestion would not compromise the API too much, after all, all these zip versions would still behave like a zip() function should. Itertools users would also know what a 'grouper' recipe does, i.e., it doesn't do the fancier alternative stuff you suggest like rejoining groups of characters obtained from a string. So this would be a relatively conservative addition. What do you think? Wolfgang -- http://mail.python.org/mailman/listinfo/python-list