On Sat, Aug 31, 2019 at 09:31:15PM +1000, Chris Angelico wrote:
> On Sat, Aug 31, 2019 at 8:44 PM Steven D'Aprano <[email protected]> wrote:
> > > So b"abc" should not be allowed?
> >
> > In what way are byte-STRINGS not strings? Unicode-strings and
> > byte-strings share a significant fraction of their APIs, and are so
> > similar that back in Python 2.2 the devs thought it was a good idea to
> > try automagically coercing from one to the other.
> >
> > I was careful to write *string* rather than *str*. Sorry if that wasn't
> > clear enough.
> >
>
> We call it a string, but a bytes object has as much in common with
> bytearray and with a list of integers as it does with a text string.
I don't think that's true.
py> b'abc'.upper()
b'ABC'
py> [1, 2, 3].upper()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'upper'
Shall I beat this dead horse some more by listing the other 33 methods
that byte-strings share with Unicode-strings but not lists?
Compared to just two methods shared by all three of bytes, str and list,
(namely count() and index()), and *zero* methods shared by bytes and
list but not str.
In Python2, byte-strings and Unicode strings were both subclasses of
type basestring. Although we have moved away from that shared base class
in Python3, it does demonstrate that conceptually bytes and str are
closely related to each other.
> Is the contents of a MIDI file a "string"? I would say no, it's not -
> but it can *contain* strings, eg for metadata and lyrics.
Don't confuse *human-readable native language strings* for generic
strings. "Hello world!" is a string, but so are '&w-8\x02^xs\0' and
b'DEADBEEF'.
> You can't upper-case the
> variable-length-integer b"\xe7\x61" any more than you can upper-case
> the integer 13281.
Of course you can.
py> b"\xe7\x61".upper()
b'\xe7A'
Whether it is *meaningful* to do so is another question. But the same
applies to str.upper: just because you can call the method doesn't mean
that the result will be semantically valid.
source = "def spam():\n\tpass\n"
source = source.upper() # no longer valid Python source code.
> Those common methods are mostly built on the
> assumption that the string contains ASCII text.
As they often do. If they don't, then don't call the text methods which
don't make sense in context.
Just as there are cases where text methods don't make sense on Unicode
strings. You wouldn't want to call .casefold() on a password, or
.lstrip() on a line of Python source code.
[...]
> Bytes and text have a long relationship, and as such, there are
> special similarities. That doesn't mean that bytes ARE text,
I didn't say that bytes are (human-readable) text. Although they can be:
not every application needs Unicode strings, ASCII strings are still
special, and there are still applications where once has to mix binary
and ASCII text data.
I said they were *strings*. Strings are not necessarily text, although
they often are. Formally, a string is a finite sequence of symbols that
are chosen from a set called an alphabet. See:
https://en.wikipedia.org/wiki/String_%28computer_science%29
> I don't think it's necessary to be too adamant about "must be some
> sort of thing-we-call-string" here. Let practicality rule, since
> purity has already waved a white flag at us.
It is because of *practicality* that we should prefer that things that
look similar should be similar. Code is read far more often that it is
written, and if you read two pieces of code that look similar, we should
strongly prefer that they should actually be similar.
Would you be happy with a Pythonesque language that used prefixed
strings as the delimiter for arbitrary data types?
mylist = L"1, 2, None, {}, L"", 99.5"
mydict = D"key: value, None: L"", "abc": "xyz""
myset = S"1, 2, None"
That's what this proposal wants: string syntax that can return arbitrary
data types.
How about using quotes for function calls?
assert chr"9" == "\t"
assert ord"9" == 57
That's what this proposal wants: string syntax for a subset of function
calls.
Don't say that this proposal won't be abused. Every one of the OP's
motivating examples is an abuse of the syntax, returning non-strings
from something that looks like a string.
--
Steven
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/BCIIWV2KMETDPB7M2OUMXRXK6A6CVHGJ/
Code of Conduct: http://python.org/psf/codeofconduct/