[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)

Andrew Barnert via Python-ideas Wed, 13 May 2020 10:53:03 -0700

On May 12, 2020, at 23:29, Stephen J. Turnbull 
<[email protected]> wrote:
> 
> Andrew Barnert writes:
>>> On May 10, 2020, at 22:36, Stephen J. Turnbull 
>>> <[email protected]> wrote:
>>> 
>>> Andrew Barnert via Python-ideas writes:
>>> 
>>>> A lot of people get this confused. I think the problem is that we
>>>> don’t have a word for “iterable that’s not an iterator”,
>>> 
>>> I think part of the problem is that people rarely see explicit
>>> iterator objects in the wild.  Most of the time we encounter iterator
>>> objects only implicitly.
>> 
>> We encounter iterators in the wild all the time, we just don’t
>> usually _care_ that they’re iterators instead of “some kind of
>> iterable”, and I think that’s the key distinction you’re looking
>> for.
> 
> It *is* the distinction I'm making with the word "explicit".  I never
> use "next" on an open file.  I'm not sure your more precise statement
> is better.
> 
> I think the real difference is that I'm thinking of "people" as
> including my students who have no clue what an iterator does and don't
> care what an iterable is, they just cargo cult
> 
>    with open("file") as f:
>        for line in f:
>            do_stuff(line)
> 
> while as you point out (and I think is appropriate in this discussion)
> some people who are discussing proposed changes are using the available
> terminology incorrectly, and that's not good.


Students often want to know why this doesn’t work:

    with open("file") as f:
        for line in file:
            do_stuff(line)
        for line in file:
            do_other_stuff(line)

… when this works fine:

    with open("file") as f:
        lines = file.readlines()
    for line in lines:
        do_stuff(line)
    for line in lines:
        do_other_stuff(line)

This question (or a variation on it) gets asked by novices every few day’s on 
StackOverflow; it’s one of the top common duplicates.

The answer is that files are iterators, while lists are… well, there is no 
word. You can explain it anyway. In fact, you _have_ to give an explanation 
with analogies and examples and so on, and that would be true even if there 
were a word for what lists are. But it would be easier to explain if there were 
such a word, and if you could link that word to something in the glossary, and 
a chapter in the tutorial.

>> Still, having clear names with simple definitions would help that
>> problem without watering down the benefits.
> 
> I disagree.  I agree there's "amortized zero" cost to the crowd who
> would use those names fairly frequently in design discussions, but
> there is a cost to the "lazy in the technical sense" programmer, who
> might want to read the documentation if it gave "simple answers to
> simple questions",
> but not if they have to wade through a thicket of
> "twisty subtle definitions all alike" to get to the simple answer, and
> especially not if it's not obvious after all that what the answer is.

We shouldn’t define everything up front, just the most important things. But 
this is one of the most important things. People need to understand this 
distinction very early on to use Python, and many of them don’t get it, hence 
all the StackOverflow duplicated. People run into this problem well before they 
run into a problem that requires them to understand the distinction between 
arguments and parameters, or protocols and ABCs, or Mapping and dict.

> It also makes conversations with experts fraught, as those experts
> will tend to provide more detail and precision than the questioner
> wants (speaking for myself, anyway!)  "Not every one-sentence
> explanation needs terminology in the documentation."

I think it’s the opposite. 

I can teach a child why a glass will break permanently when you hit it while a 
lake won’t by using the words “solid” and “liquid”. I don’t have to give them 
the scientific definitions and all the equations. I might not even know them. 
And in the same way, I can teach novices why the x after x=y+1 doesn’t change 
when y changes by teaching them about variables without having to explain 
__getattr__ and fast locals and the import system and so on.

Knowing all the subtleties or shear force or __getattribute__ or whatever 
doesn’t prevent me from teaching a kid without getting into those subtleties. 
The better I understand “solid” or “variable”, the easier it is for me to teach 
it. That’s how words work, or how the human mind works, or whatever, and that’s 
why language is useful for teaching.

>>>> But that last thing is exactly the behavior you expect from “things
>>>> like list, dict, etc.”, and it’s hard to explain, and therefore
>>>> hard to document.
>>> 
>>> Um, you just did *explain* it, quite well IMHO, you just didn't *name*
>>> it. ;-)
>> 
>> Well, it was a long, and redundant, explanation, not something
>> you’d want to see in the docs or even a PEP.
> 
> The part I was referring to was the three or so lines preceding in
> which you defined the behavior desired for views etc.  I guess to
> define terminology for all the variations that might be relevant would
> be long (and possibly unavoidably redundant).

Yes, and defining terminology for the one distinction that almost always is 
relevant helps distinguish that distinction from the other ones that rarely 
come up. Most people (especially novices) don’t often need to think about the 
distinction between iterables that are sized and also containers vs. those that 
are not both sized and containers, so the word for that doesn’t buy us much. 
But the distinction between iterators and things-like-list-and-so-on comes up 
earlier, and a lot more often, so a word for that would buy us a lot more.

>>> Isn't manual reset exactly what you want from a resettable
>>> iterator, though?
>> 
>> Yes. I certainly use seek(0) on files, and it’s a perfectly
>> cromulent concept, it’s just not the concept I’d want on a range or
>> a keys view or a sequence slice.
> 
> But you *don't* use seek(0) on files (which are not iterators, and in
> fact don't actually exist inside of Python, only names for them do).
> You use them on opened *file objects* which are iterators.

A file object is a file, in the same way that a list object is a list and an 
int object is an int. Sure, those are all abstractions, and some are quite 
vague, and occasionally it’s worth talking specifically about Python’s 
implementation of the abstraction. An int doesn’t have a storage cost; an int 
object does. A file doesn’t have a fileno, a file object does. But so what?

The fact that we use “file” ambiguously for a bunch of related but 
contradictory abstractions (a stream that you can read or write, a directory 
entry, the thing an inode points to, a document that an app is working on, …) 
makes it a bit more confusing, but unfortunately that ambiguity is forced on 
people before they even get to their first attempt at programming, so it’s 
probably too late for Python to help (or hurt).

>  When you
> open a file again, by default you get a new iterator which begins at
> the beginning, as you want for those others.
>  My point is that none of
> the other types you mention are iterators.

I don’t get what you’re driving at here.

Lists, sets, ranges, dict_keys, etc. are not iterators. You can write `for x in 
xs:` over and over and get the values over and over. Because each time, you get 
a new iterator over their values.

Files, maps, zips, generators, etc. are not like that. They’re iterators. If 
you write `for x in xs:` twice, you get nothing the second time, because each 
time you’re using the same iterator, and you’ve already used it up. Because 
iter(xs) is xs when it’s a file or generator etc.

> The difference with files
> is just that they happen to exist in Python as iterables.  But after

_What_ exists in Python as iterables? The only representation of files in 
Python is file objects—the thing you get back from open (or socket.makefile or 
io.StringIO or whatever else)—and those are iterators.

>    r = range(n)
>    ri = iter(range)
>    for i in ri:
>        if i > n_2:
>            break
> 
> you want the next "for j in ri:" to start where you left off, no?

Yes. That’s why you called iter, after all. Because doing `for i in r:` twice 
would _not_ start where you left off. Because a range is not an iterator.

But file isn’t like that—you don’t have to call iter on it to get an iterator; 
in fact, if you write fi=iter(f), fi is the same object as f. Because a file is 
an iterator.

Of course you can also get a new range with r=range(n) again, but you don’t 
have to, because one range(n) is as good as another. But one range_iter is not 
as good as another, because there’s no way to use one without using it up. And 
files aren’t like ranges, they’re like range_iters.

Compare these:

    xs = [x*2 for x in range(10)]
    ys = (y*2 for y in range(10))

Of course you can sort of iterate over ys twice by just running the same 
generator expression again to get a brand new object, but that’s not the same 
thing as iterating over xs twice. That’s not “resetting the iterator”, it’s 
creating a brand new one. In the same way, you can sort of iterate over a file 
twice just by running the expression that created it twice, but that’s not 
resetting the file object, it’s creating a new one.

The one difference between files and generators is that you can actually reset 
the file object by calling seek(0). But that doesn’t make file not an iterator. 
It just makes file an iterator with an extra feature that most iterators don’t 
have.

If “resettable iterator” means anything useful, it means something like file. 
Claiming that dict_keys is a “resettable iterator” because you can iterate over 
it twice is massively confusing, because it’s not an iterator at all, it’s the 
exact same kind of thing as a list or a range.

And I’m pretty sure that’s exactly the confusion that led you to think that 
dict_keys have weird behavior, and to suggest the same weird behavior for 
sequence views. Like thinking you can’t have two different iterators over the 
dict_keys that point to different positions—if it were an iterator, that would 
be true (notice that it’s true of files—if you call iter on a file twice, they 
will always have the same position, because they’re both actually the same 
object as file itself), but because dict_keys is not an iterator, it’s not true.

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/25PY2VQGJEUKWVNJ4CEKHYQMELJ6AMGL/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)

Reply via email to