[Python-ideas] Re: RFC: For Loop Invariants

Andrew Barnert via Python-ideas Fri, 10 Apr 2020 17:49:46 -0700

On Apr 10, 2020, at 13:29, Elliott Dehnbostel <[email protected]> wrote:
> 
> We could do this:
> chars = "abcaaabkjzhbjacvb"
> seek = {'a','b','c'}
> count = sum([1 for a in chars if a in seek])
> However, this changes important semantics by creating an entire new list 
> before summing.
It sounds like you really are just looking for generator expressions, which 
were added in Python 2.4 a couple decades ago. They have the same syntax as 
list comprehensions without the square brackets, but they generate one element 
at a time instead of generating a new list. If that’s your only problem here, 
just remove the square brackets and you’re done.
> Also, adding just one more expression to the most nested block thwarts that 
> refactor.
No it doesn’t:


    count = sum(1 for a in chars if a.isalpha() and a in seek)

And you don’t have to stop with adding another expression; you can add a whole 
new if clause, or even a nested for clause:

    count = sum(1 for s in strings if s.isalpha() for a in s if a in seek)

Of course if you add too much, it becomes a lot harder to read—we’re already 
pushing it here. But in that case, you can always pull out any subexpression 
into a function:

    def matches(s):
        return (a for a in s if a in seek)
    count = sum(1 for s in strings for a in matches(s))

… which can often then be simplified further (it should be obvious how here).

Or, even better, you can usually create a pipeline of two (or more) generator 
expressions:

    chars = (a for s in strings for a in s)
    count = sum(1 for a in chars if a in seek)

(And sometimes it’s natural to replace some of these with map, itertools 
functions, two-arg iter, etc.; sometimes it’s not. Do whichever one is more 
natural in each case, of course.)

The great thing about this style is that you can pipeline a dozen 
transformations without any nesting, and without adding any confusion. They 
just pile up linearly and you can read each one on its own. David Beazley’s 
“Generator Tricks for Systems Programmers” presentation has some great 
examples, which demonstrate it a lot better than I ever could.

And of course you can always fall back to writing nested block statements. The 
nice declarative syntax of comprehensions is great when it’s simple enough to 
understand the flow at a glance—but when it isn’t, that usually means you need 
a visual guide to understand the flow, and that’s exactly what indentation is 
for. And if there are too many indents, then that usually means you should be 
refactoring the inner part into a function.

Sure, “usually” isn’t “always”, both times. There are some cases that fall 
maddeningly in between, where a sort of hybrid where you had nested blocks but 
could collapse a couple together here and there would be the most readable 
possibility. I know I run into one a couple times/year. But I don’t think those 
cases are really much more common than that. Any new feature complicates 
Python—and this one would potentially encourage people to abuse the feature 
even when it’s really making things less readable rather than more.

Comprehension syntax is great _because_ comprehensions are limited. You know 
you can read things declaratively, because there can’t be any statements in 
there or any flow control besides the linear-nested clauses. That doesn’t apply 
to nested blocks.

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/FMLLOLBDMV2YUT4EU3HTGSYOXUQCBA2Z/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: RFC: For Loop Invariants

Reply via email to