[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

Andrew Barnert via Python-ideas Tue, 03 Mar 2020 10:08:26 -0800

On Mar 3, 2020, at 01:09, M.-A. Lemburg <[email protected]> wrote:
> 
> The main reason for having not having characters and strings is
> reducing complexity. Why try to add this now for no apparent
> net benefit ?


I don’t think the benefit is worth the (as far as I can tell insurmountable) 
backward compatibility cost, but you can’t argue that there is no benefit.

An object whose first element is itself is a valid idea, but it’s a 
pathological case; you have to write something like `lst=[]; lst.append(lst)` 
to get one. So code like this is fine:

    def flatten(xs):
        for x in xs:
            if isinstance(x, Iterable):
                yield from flatten(x)
            else:
                yield x

… in that it only infinitely recurses if you go out of your way to give it an 
infinitely recursive value.

… except that every string is an infinitely recursive value, so all you have to 
do is give it 'A'.

Which is not just weird in theory; it breaks perfectly sensible code like 
flatten. And it’s why we have to have idioms like endswith taking a 
str|Tuple[str] rather than any Iterable: forcing people to write 
s.endswith(tuple(suffixes)) when suffixes is a set Is the only reasonable way 
to avoid confusion when suffixes is an arbitrary iterable.

And, because it comes up all the time, and many other languages don’t have this 
problem, it has to be explained to new students and people coming from other 
languages, and painfully remembered or relearned by people who usually work in 
Java or whatever but occasionally have to do Python.

Of course regular Python developers have this drummed into their heads, and 
usually remember to check for str and handle it specially, and we’ve all 
learned to deal with the tuple-special idiom, and so on. But that doesn’t mean 
it’s an ideal design, just that we’ve all gotten used to it.

> I think the situation with bytes (iteration returning integers
> instead of bytes) has shown that this not a very user friendly
> nor intuitive approach:

Well, it shows that using integers is confusing. 

In fact, it’s even worse than C, where char is an integral type but at least 
not the same type as int. (A char ranges from 0 to 255; its default output and 
input in functions like printf, and C++ streams, is as a character rather than 
as a number; there are a bunch of character-related functions that take char 
but not int, although using them with an int is usually just a warning rather 
than an error; etc.)

That doesn’t mean a new type would be confusing:

>>>> b = bytes((1,2,3,4))
>>>> b
> b'\x01\x02\x03\x04'
>>>> b[:2]
> b'\x01\x02'
>>>> b[:1]
> b'\x01'
>>>> b[0]
byte(b'\x01')

In fact, it would make bytes consistent with other sequences of byte:

>>> s = list(b)
>>> s[:1]
[byte(b'\x01')]
>>> s[0]
byte(b'\x01')

… without adding any new inconsistencies:

>>> assert tuple(b[:2]) == tuple(s[:2])
>>> assert b[0] == s[0]

The downside, of course, is having one more builtin type. But that’s not an 
instant disqualifier; it’s a cost to trade off with the benefits. I think if it 
weren’t for backward compatibility, chr might turn out to be useful enough to 
qualify (byte I’m much less confident of—it comes up less often, and also once 
you start bikeshedding the interface there’s a lot more vagueness in the 
concept), or at least worth having a PEP to explain why it’s rejected. (But of 
course “if not for backward compatibility” isn’t realistic.)

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/UGGSZRM7YT7OOWHWLFMLCNGEMTWCLWAW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

Reply via email to