Bugs item #1121416, was opened at 2005-02-12 12:18 Message generated for change (Settings changed) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1121416&group_id=5470
Category: Documentation Group: None Status: Open Resolution: None >Priority: 3 Submitted By: Alan (aisaac0) Assigned to: Raymond Hettinger (rhettinger) Summary: zip incorrectly and incompletely documented Initial Comment: See the zip documentation: http://www.python.org/dev/doc/devel/lib/built-in-funcs.html i. documentation refers to sequences not to iterables ii. The other problem is easier to explain by example. Let it=iter([1,2,3,4]). What is the result of zip(*[it]*2)? The current answer is: [(1,2),(3,4)], but it is impossible to determine this from the docs, which would allow [(1,3),(2,4)] instead (or indeed other possibilities). The example expresses the solution to an actual need, so the behavior should be documented or warned against, I believe. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2005-02-16 20:10 Message: Logged In: YES user_id=80475 The first sentence becomes even less clear with the "in the same order" wording. The note about truncating to the shortest sequence length is essential and should not have been dropped. The py2.4 change note is in a standard form (\versionchanged{} following the explanation of current behavior) and should not have been altered. The part that addresses the OP's concern is too specific to the his one example and is unclear unless you know about that example. The wording is discomforting, doesn't add new information, and is somewhat not obvious in its meaning. I suggest simply changing "sequence" to "iterable". There is no sense in stating that the order of combination is undefined. It doesn't help with the OP's original desire to be able to predict the outcome of the example. However, it does have the negative effect of making a person question whether they've understood the preceding description of what actually zip() does do. zip() is about lockstep iteration and the docs should serve those users as straight-forwardly as possible. The OP's issue on the other hand only comes up when trying funky iterator magic -- adding a sentence about undefined ordering doesn't help one bit. There is a lesson in all this. These tools were borrowed from the world of functional programming which is all about programming that is free of side-effects. The OP's problem should be left as a code smell indicating a misuse of functionals. ---------------------------------------------------------------------- Comment By: Terry J. Reedy (tjreedy) Date: 2005-02-16 19:03 Message: Logged In: YES user_id=593130 I agree that the zip doc needs improvement. Confusion will continue until it is. Here is my suggested rewrite: ------------------------------------------------------------------- zip([iterable1, ...]) Return a list of tuples, where the i-th tuple contains the i-th element from each input in the same order as the inputs. With no arguments, return an empty list (before 2.4, a TypeError was raised instead.) With a single input, return a list of 1-tuples. With multiple inputs, the output length is that of the shorted input. When multiple input lengths are equal, zip(i1, ...) is similar to map(None, i1, ...), but there is no padding when otherwise. The result of zipping a volatile iterable with itself is undefined. New in 2.0. ------------------------------------------------------------------- There you have it. More information is about 15% fewer words. The reduction came from greatly condensing the overwordy sentence about obsolete behavior into a parenthetical comment. For comparison, here is the current version. ------------------------------------------------------------------- zip( [seq1, ...]) This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences. The returned list is truncated in length to the length of the shortest argument sequence. When there are multiple argument sequences which are all of the same length, zip() is similar to map() with an initial argument of None. With a single sequence argument, it returns a list of 1- tuples. With no arguments, it returns an empty list. New in version 2.0. Changed in version 2.4: Formerly, zip() required at least one argument and zip() raised a TypeError instead of returning an empty list.. ---------------------------------------------------------------------- Comment By: Nick Coghlan (ncoghlan) Date: 2005-02-12 21:25 Message: Logged In: YES user_id=1038590 The generator in the previous comment was incorrect (tuple swallows the StopIteration, so it never terminates). Try this instead: def partition(iterable, part_len): itr = iter(iterable) while 1: item = tuple(islice(itr, part_len)) if len(item) < part_len: raise StopIteration yield item ---------------------------------------------------------------------- Comment By: Nick Coghlan (ncoghlan) Date: 2005-02-12 20:30 Message: Logged In: YES user_id=1038590 Raymond's point about opaqueness is well-taken, since the given partitioning behaviour in the example was actually what was intended (I was part of the relevant c.l.p discussion). For future reference, the reliable approach is to use a generator function instead: from itertools import islice def partition(iterable, part_len): itr = iter(iterable) while 1: yield tuple(islice(itr, part_len)) ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2005-02-12 14:50 Message: Logged In: YES user_id=80475 The problem with your example does not lie with zip(). Instead, there is a misunderstanding of iter() and how iterators are consumed. Instead of iter(), the correct function is itertools.tee(): >>> zip(*tee([1,2,3,4])) [(1, 1), (2, 2), (3, 3), (4, 4)] Also, stylistically, the zip(*func) approach is too opaque. It is almost always better (at least for other readers and possibly for yourself) to write something more obvious in its intent and operation. List comprehensions and generator expressions are often more clear and easier to write correctly: >>> [(x,x) for x in [1,2,3,4]] [(1, 1), (2, 2), (3, 3), (4, 4)] I do agree that the word sequence should be dropped because it implies that non-sequence iterables are not acceptable as arguments. That's too bad because the word "sequence" seems to help people understand what zip is doing. You're correct that the zip docs do not describe its implementation in such detail as to be able to predict the [(1,2),(3,4)] result. However, that would be an over-specification. That particular result is an implementation specific detail that is subject to change. It probably won't change, but we don't want to encourage people to write code that relies on the specific order of operations within zip(). If someone wants to do something tricky, such as [(1,2),(3,4)], then they are better off writing an explicit loop so that the order of operation is clear both to themselves and to code reviewers. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1121416&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com