On May 12, 2020, at 23:29, Stephen J. Turnbull
<[email protected]> wrote:
>
> Andrew Barnert writes:
>>> On May 10, 2020, at 22:36, Stephen J. Turnbull
>>> <[email protected]> wrote:
>>>
>>> Andrew Barnert via Python-ideas writes:
>>>
>>>> A lot of people get this confused. I think the problem is that we
>>>> don’t have a word for “iterable that’s not an iterator”,
>>>
>>> I think part of the problem is that people rarely see explicit
>>> iterator objects in the wild. Most of the time we encounter iterator
>>> objects only implicitly.
>>
>> We encounter iterators in the wild all the time, we just don’t
>> usually _care_ that they’re iterators instead of “some kind of
>> iterable”, and I think that’s the key distinction you’re looking
>> for.
>
> It *is* the distinction I'm making with the word "explicit". I never
> use "next" on an open file. I'm not sure your more precise statement
> is better.
>
> I think the real difference is that I'm thinking of "people" as
> including my students who have no clue what an iterator does and don't
> care what an iterable is, they just cargo cult
>
> with open("file") as f:
> for line in f:
> do_stuff(line)
>
> while as you point out (and I think is appropriate in this discussion)
> some people who are discussing proposed changes are using the available
> terminology incorrectly, and that's not good.
Students often want to know why this doesn’t work:
with open("file") as f:
for line in file:
do_stuff(line)
for line in file:
do_other_stuff(line)
… when this works fine:
with open("file") as f:
lines = file.readlines()
for line in lines:
do_stuff(line)
for line in lines:
do_other_stuff(line)
This question (or a variation on it) gets asked by novices every few day’s on
StackOverflow; it’s one of the top common duplicates.
The answer is that files are iterators, while lists are… well, there is no
word. You can explain it anyway. In fact, you _have_ to give an explanation
with analogies and examples and so on, and that would be true even if there
were a word for what lists are. But it would be easier to explain if there were
such a word, and if you could link that word to something in the glossary, and
a chapter in the tutorial.
>> Still, having clear names with simple definitions would help that
>> problem without watering down the benefits.
>
> I disagree. I agree there's "amortized zero" cost to the crowd who
> would use those names fairly frequently in design discussions, but
> there is a cost to the "lazy in the technical sense" programmer, who
> might want to read the documentation if it gave "simple answers to
> simple questions",
> but not if they have to wade through a thicket of
> "twisty subtle definitions all alike" to get to the simple answer, and
> especially not if it's not obvious after all that what the answer is.
We shouldn’t define everything up front, just the most important things. But
this is one of the most important things. People need to understand this
distinction very early on to use Python, and many of them don’t get it, hence
all the StackOverflow duplicated. People run into this problem well before they
run into a problem that requires them to understand the distinction between
arguments and parameters, or protocols and ABCs, or Mapping and dict.
> It also makes conversations with experts fraught, as those experts
> will tend to provide more detail and precision than the questioner
> wants (speaking for myself, anyway!) "Not every one-sentence
> explanation needs terminology in the documentation."
I think it’s the opposite.
I can teach a child why a glass will break permanently when you hit it while a
lake won’t by using the words “solid” and “liquid”. I don’t have to give them
the scientific definitions and all the equations. I might not even know them.
And in the same way, I can teach novices why the x after x=y+1 doesn’t change
when y changes by teaching them about variables without having to explain
__getattr__ and fast locals and the import system and so on.
Knowing all the subtleties or shear force or __getattribute__ or whatever
doesn’t prevent me from teaching a kid without getting into those subtleties.
The better I understand “solid” or “variable”, the easier it is for me to teach
it. That’s how words work, or how the human mind works, or whatever, and that’s
why language is useful for teaching.
>>>> But that last thing is exactly the behavior you expect from “things
>>>> like list, dict, etc.”, and it’s hard to explain, and therefore
>>>> hard to document.
>>>
>>> Um, you just did *explain* it, quite well IMHO, you just didn't *name*
>>> it. ;-)
>>
>> Well, it was a long, and redundant, explanation, not something
>> you’d want to see in the docs or even a PEP.
>
> The part I was referring to was the three or so lines preceding in
> which you defined the behavior desired for views etc. I guess to
> define terminology for all the variations that might be relevant would
> be long (and possibly unavoidably redundant).
Yes, and defining terminology for the one distinction that almost always is
relevant helps distinguish that distinction from the other ones that rarely
come up. Most people (especially novices) don’t often need to think about the
distinction between iterables that are sized and also containers vs. those that
are not both sized and containers, so the word for that doesn’t buy us much.
But the distinction between iterators and things-like-list-and-so-on comes up
earlier, and a lot more often, so a word for that would buy us a lot more.
>>> Isn't manual reset exactly what you want from a resettable
>>> iterator, though?
>>
>> Yes. I certainly use seek(0) on files, and it’s a perfectly
>> cromulent concept, it’s just not the concept I’d want on a range or
>> a keys view or a sequence slice.
>
> But you *don't* use seek(0) on files (which are not iterators, and in
> fact don't actually exist inside of Python, only names for them do).
> You use them on opened *file objects* which are iterators.
A file object is a file, in the same way that a list object is a list and an
int object is an int. Sure, those are all abstractions, and some are quite
vague, and occasionally it’s worth talking specifically about Python’s
implementation of the abstraction. An int doesn’t have a storage cost; an int
object does. A file doesn’t have a fileno, a file object does. But so what?
The fact that we use “file” ambiguously for a bunch of related but
contradictory abstractions (a stream that you can read or write, a directory
entry, the thing an inode points to, a document that an app is working on, …)
makes it a bit more confusing, but unfortunately that ambiguity is forced on
people before they even get to their first attempt at programming, so it’s
probably too late for Python to help (or hurt).
> When you
> open a file again, by default you get a new iterator which begins at
> the beginning, as you want for those others.
> My point is that none of
> the other types you mention are iterators.
I don’t get what you’re driving at here.
Lists, sets, ranges, dict_keys, etc. are not iterators. You can write `for x in
xs:` over and over and get the values over and over. Because each time, you get
a new iterator over their values.
Files, maps, zips, generators, etc. are not like that. They’re iterators. If
you write `for x in xs:` twice, you get nothing the second time, because each
time you’re using the same iterator, and you’ve already used it up. Because
iter(xs) is xs when it’s a file or generator etc.
> The difference with files
> is just that they happen to exist in Python as iterables. But after
_What_ exists in Python as iterables? The only representation of files in
Python is file objects—the thing you get back from open (or socket.makefile or
io.StringIO or whatever else)—and those are iterators.
> r = range(n)
> ri = iter(range)
> for i in ri:
> if i > n_2:
> break
>
> you want the next "for j in ri:" to start where you left off, no?
Yes. That’s why you called iter, after all. Because doing `for i in r:` twice
would _not_ start where you left off. Because a range is not an iterator.
But file isn’t like that—you don’t have to call iter on it to get an iterator;
in fact, if you write fi=iter(f), fi is the same object as f. Because a file is
an iterator.
Of course you can also get a new range with r=range(n) again, but you don’t
have to, because one range(n) is as good as another. But one range_iter is not
as good as another, because there’s no way to use one without using it up. And
files aren’t like ranges, they’re like range_iters.
Compare these:
xs = [x*2 for x in range(10)]
ys = (y*2 for y in range(10))
Of course you can sort of iterate over ys twice by just running the same
generator expression again to get a brand new object, but that’s not the same
thing as iterating over xs twice. That’s not “resetting the iterator”, it’s
creating a brand new one. In the same way, you can sort of iterate over a file
twice just by running the expression that created it twice, but that’s not
resetting the file object, it’s creating a new one.
The one difference between files and generators is that you can actually reset
the file object by calling seek(0). But that doesn’t make file not an iterator.
It just makes file an iterator with an extra feature that most iterators don’t
have.
If “resettable iterator” means anything useful, it means something like file.
Claiming that dict_keys is a “resettable iterator” because you can iterate over
it twice is massively confusing, because it’s not an iterator at all, it’s the
exact same kind of thing as a list or a range.
And I’m pretty sure that’s exactly the confusion that led you to think that
dict_keys have weird behavior, and to suggest the same weird behavior for
sequence views. Like thinking you can’t have two different iterators over the
dict_keys that point to different positions—if it were an iterator, that would
be true (notice that it’s true of files—if you call iter on a file twice, they
will always have the same position, because they’re both actually the same
object as file itself), but because dict_keys is not an iterator, it’s not true.
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/25PY2VQGJEUKWVNJ4CEKHYQMELJ6AMGL/
Code of Conduct: http://python.org/psf/codeofconduct/