On Thu, Aug 29, 2019 at 02:10:21PM -0700, Andrew Barnert wrote:
[...]
> And most of the string affixes people have suggested are for
> string-ish things.
I don't think that's correct. Looking back at the original post in this
thread, here are the motivating examples:
[quote]
There are quite a few situations where this can be used:
- Fraction literals: `frac'123/4567'`
- Decimals: `dec'5.34'`
- Date/time constants: `t'2019-08-26'`
- SQL expressions: `sql'SELECT * FROM tbl WHERE a=?'.bind(a=...)`
- Regular expressions: `rx'[a-zA-Z]+'`
- Version strings: `v'1.13.0a'`
- etc.
[/quote]
By my count, that's zero out of six string-ish things. There may have
been other proposals, but I haven't trolled through the entire thread to
find them.
> I’m not sure what a “version string” is, but I
> might design that as an actual subclass of str that adds extractor
> methods and overrides comparison.
A version object is a record with fields, most of which are numeric.
For an existing example, see sys.version_info which is a kind of named
tuple, not a string.
The version *string* is just a nice human-readable representation. It
doesn't make sense to implement string methods on a Version object. Why
would you offer expandtabs(), find(), splitlines(), translate(),
isspace(), capitalise(), etc methods? Or * and + (repetition and
concatenation) operators? I cannot think of a single string
method/operator that a Version object should implement.
> A compiled regex isn’t literally a
> string, but neither is a bytes; it’s still clearly _similar_ to a
> string, in important ways.
It isn't clear to me how a compiled regex object is "similar" to a
string. The set of methods offered by both regexes and strings is pretty
small, by my generous count it is just two methods:
- str.split and SRE_Pattern.split;
- str.replace and SRE_Pattern.sub
neither of which use the same API or have the same semantics. Compiled
regex objects don't offer string methods like translate, isdigits,
upper, encode, etc. I would say that they are clearly *not* strings.
[...]
> And versions of the proposal that allow delimiters other than quotes
> so you can write things like regex/a.*b/, well, I’d need to see a
> specific proposal to be sure, but that seems even less objectionable
> in this regard. That looks like nothing else in Python, but it looks
> like a regex in awk or sed or perl, so I’d probably read it as a regex
> object.
Why do you need the "regex" prefix? Assuming the parser and the human
reader can cope with using / as both a delimiter and a operator (which
isn't a given!) /.../ for a regex object seems fine to me.
I suspect that this is going to be ambiguous though:
target = regex/a*b/ +x
could be:
target = ((regex / a) * b) / ( unary-plus x)
or
target = (regex object) + x
so maybe we do need a prefix.
> > Let me suggest some design principles that should hold for languages
> > with more-or-less "conventional" syntax. Languages like APL or Forth
> > excluded.
> >
> > - anything using ' or " quotation marks as delimiters (with or without
> > affixes) ought to return a string, and nothing but a string;
>
> So b"abc" should not be allowed?
In what way are byte-STRINGS not strings? Unicode-strings and
byte-strings share a significant fraction of their APIs, and are so
similar that back in Python 2.2 the devs thought it was a good idea to
try automagically coercing from one to the other.
I was careful to write *string* rather than *str*. Sorry if that wasn't
clear enough.
> Let’s say I created a native-UTF16-string type to deal with some
> horrible Windows or Java stuff. Why would this principle of yours
> suggest that I shouldn’t be allowed to use u16"" just like b””?
It is a utf16 STRING so making it look like a STRING is perfectly fine.
[...]
> > - as a strong preference, anything using quotation marks as delimiters
> > ought to be processed at compile-time (f-strings are a conspicuous
> > exception to that principle);
>
> I don’t see why you should even want to _know_ whether it’s true, much
> less have a strong preference.
Because I care about performance, at least a bit. Because I don't want
to write code that is unnecessarily slow, for some definition of
"unnecessary". Because I want to be able to reason (at least in broad
terms) about the cost of certain operations.
Because I want to be able to reason about the semantics of my code.
Why do I write 1234 instead of int("1234")? The second is longer, but it
is more explicit and it is self-documenting: the reader knows that its
an int because it says so right there in the code, even if they come
from Javascript where 1234 is an IEEE-754 float.
Assuming the builtin int() hasn't be shadowed.
But it's also wastefully slow.
If we are genuinely indifferent to the difference, then we should be
equally indifferent to a proposal to replace the LOAD_CONST byte-code
for ints as follows:
dis("1234") # in current Python
LOAD_CONST 0 (1234)
# In the future:
LOAD_NAME 0 (int)
LOAD_CONST 0 ('1234')
CALL_FUNCTION 1 (1 positional, 0 keyword pair)
If you were asked to vote +1 or -1 on this proposal (sitting on the
fence not allowed), which would you vote? I would vote -1.
Aside from the performance hit, it's also a semantic change: what was a
compile-time literal is now a runtime function call which can be
shadowed. It is nice to know that when I say ``n = 1234`` that the value
of n is guaranteed to be 1234 no matter what odd things are going on.
(Short of running a modified interpreter.)
String literals (byte- or unicode, raw or cooked, triple- or
single-quoted) are, with the exception of f-strings, LOAD_CONST calls
like ints. I think that's a valuable, useful thing to know, and not
something we should lightly give up.
> Here are things you probably really do care about: (a) they act like
> strings, (b) they act like constants,
Don't confuse immutability with constant-ness.
Python doesn't have constants, except by convention. There's no easy way
to prevent a simple name from being rebound.
> (c) if there are potential issues parsing them, you see those issues
> as soon as possible,
Like at compile-time?
Consider the difference between the compile-time syntax error you get
here:
x = 123w456
versus the run-time error you get here:
x = int("123w456")
I can understand saying "we have no choice but to make this a runtime
operation", or even "on the balance of pros and cons, it isn't worth the
extra work to make this happen at compile-time".
I don't like it that we have to write Decimal("123.456"), but I
understand the reasons why we have to and can accept that it is a
necessary evil.
(To clarify: of course it is a feature that we *can* pass strings to the
Decimal constructor, when such strings come from user-input or are read
from data files etc.)
But I don't think that it is a feature that there is no alternative but
to pass a string, even when the value is known at edit-time. And I don't
understand your position that I shouldn't care about the difference.
> (d) working with them is more than fast enough.
You are right that Python is usually "fast enough" (except when it
isn't), and that the one-off cost of creating a few pseudo-constants is
generally only a small fraction of the cost of most programs.
But Python isn't quote-unquote "slow" because of any one single thing,
it is more of a death by a thousand cuts, lots of *small* inefficiences
which individually don't matter but collectively add up to making Python
up to a hundred times slower than C.
When performance matters, which would you rather write?
for item in huge_sequence:
value = item + 1234
value = item + int("1234")
I know that when I use a literal, it will be as fast as it possibly can
be in Python, or at least there's nothing *I* can do to make it faster.
But when I have to use a call like Decimal("123.45"), that's one more
thing for me to have to worry about: is it fast enough? Can I make it
faster? Should I make it faster?
We should be wary about hiding potentially slow code in something that
looks like fast code.
(Yes, that's a criticism of properties too, but in the case of
properties we know that the benefits outweigh the risk. It's not clear
that this is the case here.)
> Compile time is neither
> necessary (Haskell) nor sufficient (Tcl) for any of that. So why
> insist on compile-time instead of insisting on a-d?
I think you will find that I said this should be "a strong preference",
which is hardly *insisting*.
> > No I'm not. I'm going to think of it as a *string*, because it looks
> > like a string.
>
> Well, yes. It’s a path string, or a regex string, or a version string,
Actually, no, it will be a Path object, a compiled regex SRE_pattern
object, or a Version object, not a string at all.
> or whatever, which is loosely a kind of string but not literally one.
> Like bytes.
Bytes literally are strings. They just aren't strings of Unicode
characters.
> Or it’s a sql cursor, in which case it was probably a misuse of the feature.
That's one of the motivating examples. I agree it is a misuse of the
proposed feature.
> > Particularly given the OP's preference for single-letter prefixes.
>
> OK, I will agree with you there that the overuse of single-letter
> prefixes in the motivating examples is a worrying sign. In principle
> there’s nothing wrong with single letters (and I think I can make a
> good case for the f suffix as a good use in 3D-math code).
I can concur with all of that.
[...]
> As I’ve said before, I believe that anything that doesn’t have a
> builtin type does not deserve builtin syntax.
Agreed. Although there's a bit of fuzziness over the concept of
"builtin". Not all built-in objects are available in the ``builtins``
module, e.g. NoneType, or FunctionType.
> And I don’t understand
> why that isn’t a near-ubiquitous viewpoint. But it’s not just you; at
> least three people (all of whom dislike the whole concept of custom
> affixes) seem at least in principle open to the idea of adding builtin
> affixes for types that don’t exist. Which makes me think it’s almost
> certainly not that you’re all crazy, but that I’m missing something
> important. Can you explain it to me?
I thought it went without saying that a necessary pre-condition for
adding builtin syntax for a type was for the type to become built-in
first. Sorry if it wasn't as clear or obvious as I thought.
--
Steven
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/WDR2QHG4EBB3FP6Z2T6CGKC7O7D4KDA5/
Code of Conduct: http://python.org/psf/codeofconduct/