On 9/18/2014 10:45 PM, Chris Angelico wrote:
On Fri, Sep 19, 2014 at 9:52 AM, Roy Smith <r...@panix.com> wrote:
In article <mailman.14103.1411047208.18130.python-l...@python.org>,
Chris Angelico <ros...@gmail.com> wrote:
The one thing you can rely on (and therefore must comply with, when
you design an iterable) is that iteration will hit every element
in what, a collection that exists separately from the iteration? or a
collection generated by the iteration?
This is actually two statements in one -- what the user can rely on,
which I claim is very little (see below) -- and what the designer must
'comply with', which is correspondingly very little (again see below)
except as the designer claims otherwise in specific documentation.
exactly once.
There are only two things to rely on without further, special case,
information.
iter(iterable) = iterator (if not, 'iterable' is not an iterable)
next(iterator) = object or raises StopIteration (with raising not
guaranteed)
The conceptual flaw in the statement above, as I think originally
intended, is the notion that 'every element [of a collection]' exists,
or is defined before the iteration. Iterators can be creative, in the
sense of generating collections as they go, in a manner not determined
by the past. It is also possible that the next item depends on what the
iterator user has done with previous items. Such feedback loops are
common for programs that interact with users.
If the statement above is interpreted retrospectively, as "iteration
hits every item of the sequence it generates", then it is trivially true
and is equivalent to "iteration yields every element that it yields, in
the order that it yields them". To paraphase the OP's question, given
completion of
yielded1 = [item for item in iterable]
yielded2 = [item for item in yielded1]
then yielded2 == yielded1 *is* guarenteed. But this is all that is
guaranteed.
Itertools.tee operates similarly to the above code except that
processing of the duplicate streams can be interleaved. There is an
internal queue of items yielded on one branch not the other. For best
operation, accesses should be interleaved, with the maximum lag bounded.
Itertools.tee guarantees that the two branches yield the same items in
the same order.
Does it actually say that somewhere? For example:
for i in bag.pick_randomly_with_replacement(n=5):
print i
shouldn't do that.
When you pick randomly from a population, you create a new population,
which may have duplicates compared to the original. (For efficiency's
sake it probably won't all actually exist immediately, but
conceptually it does exist.)
When the next item depends on feedback from the user, that conceptual
trick does not work as well.
> That's what you're iterating over - not the bag itself.
If one iterates over anything other that a sequence, in forward order,
then one is, in effect, iterating over a new sequence generated from the
'base' collection. In particular, set and dict iteration iterates over
an arbitrary serialization of the set or dict.
In the example above, the underlying collection (or specification
thereof) and size of selection can be hidden away in a class instance so
that the code may just look like 'for i in my_iterable:", just as for
iterating over a set.
--
Terry Jan Reedy
--
https://mail.python.org/mailman/listinfo/python-list