On 2020-04-27 8:37 p.m., Andrew Barnert wrote:
On Apr 27, 2020, at 14:38, Soni L. <[email protected]> wrote:

[snipping a long unanswered reply]

> The explicit case for zip is if you *don't* want it to consume anything after 
the stop.

Sure, but *when do you want that*? What’s an example of code you want to write 
that would be more readable, or easier to write, or whatever, if you could work 
around consuming anything after the stop?

so here's one example, let's say you want to iterate multiple things (like with zip), get a count out of it, as well as partially consume an external iterator without swallowing any extra values from it. it'd look something like this:

    def foo(self, other_things):
      for x in zip(range(sys.maxsize), self.my_things, other_things):
        do_stuff
      else as y:
        return y[0] # count

using extended for-else + partial-zip. it stops as soon as self.my_things stops. and then the caller can do whatever else it needs with other_things. (altho maybe it's considered unpythonic to reuse iterators like this? I like it tho.)


> btw: I suggest reading the whole post as one rather than trying to pick it 
apart.

I did read the whole post, and then went back to reply to each part in-line. 
You can tell by the fact that I refer to things later in the post. For example, 
when I refer to your proposed code being better than “the ugly mess that you 
posted below“ as the current alternative, it should be pretty clear that I’ve 
already read the ugly mess that you posted below.

So why did I format it as replies inline? Because that’s standard netiquette 
that goes back to the earliest days of email lists. Most people find it 
confusing (and sometimes annoying) to read a giant quote and then a giant reply 
and try to figure out what’s being referred to where, so when you have a giant 
message to reply to, it’s helpful to reply inline.

But as a bonus, writing a reply that way makes it clear to yourself if you’ve 
left out anything important. You didn’t reply to multiple issues that I raised, 
and I doubt that it’s because you don’t have any answers and are just trying to 
hide that fact to trick people into accepting your proposal anyway, but rather 
than you just forgot to get to some things because it’s easy to miss important 
stuff when you’re not replying inline.

you kept bringing up how I should talk about things first and break them down, rather than build them up and expand on them as the post goes on. I prefer the latter. I don't mind inline replies, and in fact I prefer them (altho I'm not always great at that), and that's not what I raise an issue with.


> the purpose of the proposal, as a whole, is to make it easier to pick things 
- generators in particular - apart. I tried to make that clear but clearly I 
failed.

No, you did make that part clear; what you didn’t make clear is (a) what 
exactly you’re trying to pick apart from the generators and why, (b) what 
actual problems look like, (c) how your proposal could make that code better, 
and (d) why existing solutions (like manually nexting iterators in a while 
loop, or using tools like peekable) don’t already solve the problem.

Without any of that, all you’re doing is offering something abstract that might 
conceivably be useful, but it’s not clear where or why or even whether it would 
ever come up, so for all we know it’ll *never* actually be useful. Nobody’s 
likely to get on board with such a change.

> Side note, here's one case where it'd be better than using zip_longest:

Your motivating example should not be a “side note”, it should be the core of 
any proposal.

that is not my motivating example. if anything my motivating example is because I wanna do some very unpythonic things.

like this:

for x in things:
  yield Wrap(x)
else with y:
  yield y
return len(things)

and then we nest this and we get a nice wrap of wraps wrapped in wraps with lengths at the end. why? ... because I want it to work like this, tbh. .-.


But it should also be a real example, not a meaningless toy example. Especially 
not one where even you can’t imagine an actual similar use case. “We should add 
this feature because it would let you write code that I can’t imagine ever 
wanting to write” isn’t a rationale that’s going to attract much support.

> for a, b, c, d, e, f, g in zip(*[iter(x)]*7): # this pattern is suggested by 
the zip() docs, btw.
>    use_7x_algorithm(a, b, c, d, e, f, g)
> else as x: # leftovers that didn't fit the 7-tuple.
>    use_slow_variable_arity_algorithm(*x)

Why do you want to unpack into 7 variables with meaningless names just to pass 
those 7 variables? And if you don’t need that part, why can’t you just write 
this with zip_skip (which, as mentioned in the other thread, is pretty easy to 
write around zip_longest)?

The best guess I can come up with is that in a real life example maybe that 
would have some performance cost that’s hard to see in this toy. But then if 
that’s the case, given that x is clearly not an iterator, is it a sequence? You 
could then presumably get much more optimization by looping over slices instead 
of using the grouper idiom in the first place. Or, as you say, by using numpy.

> I haven't found a real use-case for this yet, tho.
> SIMD is handled by numpy, which does a better job than you could ever hope 
for in plain python, and for SIMD you could use zip_longest with a suitable dummy 
instead. but... yeah, not really useful.

> (actually: why do the docs for zip() even suggest this stuff anyway? seems 
like something nobody would actually use.)

That grouping idiom is useful for all kinds of things that _aren’t_ about 
optimization. Maybe the zip docs aren’t the best place for it (but it’s also in 
the itertools recipes, which probably is the best place for it), but it’s 
definitely useful. In fact, I used it less than a week ago. We’ve got this tool 
that writes a bunch of 4-line files, and someone concatenated a bunch of them 
together and wrote this horrible code to pull them back apart in another 
language I won’t mention here, and rather than debug their code, I just rewrote 
it in Python like this:

    with open(path) as f:
        for entry in chunkify(f, 4):
            process(entry)

I used a function called chunkify because I think that’s a lot easier to 
understand (especially for colleagues who don’t use Python very often), and we 
already had it lying around in a utils module, but it’s just implemented as 
zip(*[iter(it)]*n).

see: why are we perfectly happy with ignoring extra lines at the end? an "else" would serve you well, even if it's just to "assert len(remaining) == 0". but we can't do that, can we? because zip swallows the extras. :/


Also, compare this other example for processing a different file format:

    with open(path) as f:
        for entry in split(f, '\n'):
            process(entry)

It’s pretty obvious what the difference is here: one is reading entries that 
are groups of 4 lines; the other is reading entries that are groups of 
arbitrary numbers of lines but separated by blank lines. At most you might need 
to look at the help for chunkify and split to be absolutely sure they mean what 
you think they mean. (Although maybe I should have used functions from 
more-itertools rather than our own custom functions that do effectively the 
same thing but are kind of weird and probably not so well tested and whose 
names don’t come up in a web search.)



and... well I'm assuming this one just yields the extras at the end of the file/iterator? (I hope? or maybe it'd also benefit from an "else", even if it was just an assert.)

(and yeah this does make me uncomfortable. *please* verify your data! I learned this from rust tbh but I apply it everywhere.)
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/F53III4AYA7OK3GBNJMPNYO4XR7FZNEE/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to