On 2020-04-27 8:37 p.m., Andrew Barnert wrote:
On Apr 27, 2020, at 14:38, Soni L. <[email protected]> wrote:
[snipping a long unanswered reply]
> The explicit case for zip is if you *don't* want it to consume anything after
the stop.
Sure, but *when do you want that*? What’s an example of code you want to write
that would be more readable, or easier to write, or whatever, if you could work
around consuming anything after the stop?
so here's one example, let's say you want to iterate multiple things
(like with zip), get a count out of it, as well as partially consume an
external iterator without swallowing any extra values from it. it'd look
something like this:
def foo(self, other_things):
for x in zip(range(sys.maxsize), self.my_things, other_things):
do_stuff
else as y:
return y[0] # count
using extended for-else + partial-zip. it stops as soon as
self.my_things stops. and then the caller can do whatever else it needs
with other_things. (altho maybe it's considered unpythonic to reuse
iterators like this? I like it tho.)
> btw: I suggest reading the whole post as one rather than trying to pick it
apart.
I did read the whole post, and then went back to reply to each part in-line.
You can tell by the fact that I refer to things later in the post. For example,
when I refer to your proposed code being better than “the ugly mess that you
posted below“ as the current alternative, it should be pretty clear that I’ve
already read the ugly mess that you posted below.
So why did I format it as replies inline? Because that’s standard netiquette
that goes back to the earliest days of email lists. Most people find it
confusing (and sometimes annoying) to read a giant quote and then a giant reply
and try to figure out what’s being referred to where, so when you have a giant
message to reply to, it’s helpful to reply inline.
But as a bonus, writing a reply that way makes it clear to yourself if you’ve
left out anything important. You didn’t reply to multiple issues that I raised,
and I doubt that it’s because you don’t have any answers and are just trying to
hide that fact to trick people into accepting your proposal anyway, but rather
than you just forgot to get to some things because it’s easy to miss important
stuff when you’re not replying inline.
you kept bringing up how I should talk about things first and break them
down, rather than build them up and expand on them as the post goes on.
I prefer the latter. I don't mind inline replies, and in fact I prefer
them (altho I'm not always great at that), and that's not what I raise
an issue with.
> the purpose of the proposal, as a whole, is to make it easier to pick things
- generators in particular - apart. I tried to make that clear but clearly I
failed.
No, you did make that part clear; what you didn’t make clear is (a) what
exactly you’re trying to pick apart from the generators and why, (b) what
actual problems look like, (c) how your proposal could make that code better,
and (d) why existing solutions (like manually nexting iterators in a while
loop, or using tools like peekable) don’t already solve the problem.
Without any of that, all you’re doing is offering something abstract that might
conceivably be useful, but it’s not clear where or why or even whether it would
ever come up, so for all we know it’ll *never* actually be useful. Nobody’s
likely to get on board with such a change.
> Side note, here's one case where it'd be better than using zip_longest:
Your motivating example should not be a “side note”, it should be the core of
any proposal.
that is not my motivating example. if anything my motivating example is
because I wanna do some very unpythonic things.
like this:
for x in things:
yield Wrap(x)
else with y:
yield y
return len(things)
and then we nest this and we get a nice wrap of wraps wrapped in wraps
with lengths at the end. why? ... because I want it to work like this,
tbh. .-.
But it should also be a real example, not a meaningless toy example. Especially
not one where even you can’t imagine an actual similar use case. “We should add
this feature because it would let you write code that I can’t imagine ever
wanting to write” isn’t a rationale that’s going to attract much support.
> for a, b, c, d, e, f, g in zip(*[iter(x)]*7): # this pattern is suggested by
the zip() docs, btw.
> use_7x_algorithm(a, b, c, d, e, f, g)
> else as x: # leftovers that didn't fit the 7-tuple.
> use_slow_variable_arity_algorithm(*x)
Why do you want to unpack into 7 variables with meaningless names just to pass
those 7 variables? And if you don’t need that part, why can’t you just write
this with zip_skip (which, as mentioned in the other thread, is pretty easy to
write around zip_longest)?
The best guess I can come up with is that in a real life example maybe that
would have some performance cost that’s hard to see in this toy. But then if
that’s the case, given that x is clearly not an iterator, is it a sequence? You
could then presumably get much more optimization by looping over slices instead
of using the grouper idiom in the first place. Or, as you say, by using numpy.
> I haven't found a real use-case for this yet, tho.
> SIMD is handled by numpy, which does a better job than you could ever hope
for in plain python, and for SIMD you could use zip_longest with a suitable dummy
instead. but... yeah, not really useful.
> (actually: why do the docs for zip() even suggest this stuff anyway? seems
like something nobody would actually use.)
That grouping idiom is useful for all kinds of things that _aren’t_ about
optimization. Maybe the zip docs aren’t the best place for it (but it’s also in
the itertools recipes, which probably is the best place for it), but it’s
definitely useful. In fact, I used it less than a week ago. We’ve got this tool
that writes a bunch of 4-line files, and someone concatenated a bunch of them
together and wrote this horrible code to pull them back apart in another
language I won’t mention here, and rather than debug their code, I just rewrote
it in Python like this:
with open(path) as f:
for entry in chunkify(f, 4):
process(entry)
I used a function called chunkify because I think that’s a lot easier to
understand (especially for colleagues who don’t use Python very often), and we
already had it lying around in a utils module, but it’s just implemented as
zip(*[iter(it)]*n).
see: why are we perfectly happy with ignoring extra lines at the end? an
"else" would serve you well, even if it's just to "assert len(remaining)
== 0". but we can't do that, can we? because zip swallows the extras. :/
Also, compare this other example for processing a different file format:
with open(path) as f:
for entry in split(f, '\n'):
process(entry)
It’s pretty obvious what the difference is here: one is reading entries that
are groups of 4 lines; the other is reading entries that are groups of
arbitrary numbers of lines but separated by blank lines. At most you might need
to look at the help for chunkify and split to be absolutely sure they mean what
you think they mean. (Although maybe I should have used functions from
more-itertools rather than our own custom functions that do effectively the
same thing but are kind of weird and probably not so well tested and whose
names don’t come up in a web search.)
and... well I'm assuming this one just yields the extras at the end of
the file/iterator? (I hope? or maybe it'd also benefit from an "else",
even if it was just an assert.)
(and yeah this does make me uncomfortable. *please* verify your data! I
learned this from rust tbh but I apply it everywhere.)
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/F53III4AYA7OK3GBNJMPNYO4XR7FZNEE/
Code of Conduct: http://python.org/psf/codeofconduct/