[Python-ideas] Re: Custom string prefixes

Steven D'Aprano Sat, 31 Aug 2019 03:42:24 -0700

On Thu, Aug 29, 2019 at 02:10:21PM -0700, Andrew Barnert wrote:

[...]
> And most of the string affixes people have suggested are for 
> string-ish things.


I don't think that's correct. Looking back at the original post in this 
thread, here are the motivating examples:

[quote]

There are quite a few situations where this can be used:
- Fraction literals: `frac'123/4567'`
- Decimals: `dec'5.34'`
- Date/time constants: `t'2019-08-26'`
- SQL expressions: `sql'SELECT * FROM tbl WHERE a=?'.bind(a=...)`
- Regular expressions: `rx'[a-zA-Z]+'`
- Version strings: `v'1.13.0a'`
- etc.

[/quote]

By my count, that's zero out of six string-ish things. There may have 
been other proposals, but I haven't trolled through the entire thread to 
find them.


> I’m not sure what a “version string” is, but I 
> might design that as an actual subclass of str that adds extractor 
> methods and overrides comparison.

A version object is a record with fields, most of which are numeric. 
For an existing example, see sys.version_info which is a kind of named 
tuple, not a string.

The version *string* is just a nice human-readable representation. It 
doesn't make sense to implement string methods on a Version object. Why 
would you offer expandtabs(), find(), splitlines(), translate(), 
isspace(), capitalise(), etc methods? Or * and + (repetition and 
concatenation) operators? I cannot think of a single string 
method/operator that a Version object should implement.


> A compiled regex isn’t literally a 
> string, but neither is a bytes; it’s still clearly _similar_ to a 
> string, in important ways. 

It isn't clear to me how a compiled regex object is "similar" to a 
string. The set of methods offered by both regexes and strings is pretty 
small, by my generous count it is just two methods:

- str.split and SRE_Pattern.split;

- str.replace and SRE_Pattern.sub

neither of which use the same API or have the same semantics. Compiled 
regex objects don't offer string methods like translate, isdigits, 
upper, encode, etc. I would say that they are clearly *not* strings.


[...]
> And versions of the proposal that allow delimiters other than quotes 
> so you can write things like regex/a.*b/, well, I’d need to see a 
> specific proposal to be sure, but that seems even less objectionable 
> in this regard. That looks like nothing else in Python, but it looks 
> like a regex in awk or sed or perl, so I’d probably read it as a regex 
> object.

Why do you need the "regex" prefix? Assuming the parser and the human 
reader can cope with using / as both a delimiter and a operator (which 
isn't a given!) /.../ for a regex object seems fine to me.

I suspect that this is going to be ambiguous though:

    target = regex/a*b/ +x

could be:

    target = ((regex / a) * b) / ( unary-plus x)

or 

    target = (regex object) + x

so maybe we do need a prefix.


> > Let me suggest some design principles that should hold for languages 
> > with more-or-less "conventional" syntax. Languages like APL or Forth 
> > excluded.
> > 
> > - anything using ' or " quotation marks as delimiters (with or without 
> >  affixes) ought to return a string, and nothing but a string;
> 
> So b"abc" should not be allowed?

In what way are byte-STRINGS not strings? Unicode-strings and 
byte-strings share a significant fraction of their APIs, and are so 
similar that back in Python 2.2 the devs thought it was a good idea to 
try automagically coercing from one to the other.

I was careful to write *string* rather than *str*. Sorry if that wasn't 
clear enough.


> Let’s say I created a native-UTF16-string type to deal with some 
> horrible Windows or Java stuff. Why would this principle of yours 
> suggest that I shouldn’t be allowed to use u16"" just like b””?

It is a utf16 STRING so making it look like a STRING is perfectly fine.


[...]
> > - as a strong preference, anything using quotation marks as delimiters
> >  ought to be processed at compile-time (f-strings are a conspicuous 
> >  exception to that principle);
> 
> I don’t see why you should even want to _know_ whether it’s true, much 
> less have a strong preference.

Because I care about performance, at least a bit. Because I don't want 
to write code that is unnecessarily slow, for some definition of 
"unnecessary". Because I want to be able to reason (at least in broad 
terms) about the cost of certain operations.

Because I want to be able to reason about the semantics of my code.

Why do I write 1234 instead of int("1234")? The second is longer, but it 
is more explicit and it is self-documenting: the reader knows that its 
an int because it says so right there in the code, even if they come 
from Javascript where 1234 is an IEEE-754 float.

Assuming the builtin int() hasn't be shadowed.

But it's also wastefully slow.

If we are genuinely indifferent to the difference, then we should be 
equally indifferent to a proposal to replace the LOAD_CONST byte-code 
for ints as follows:

    dis("1234")  # in current Python
    LOAD_CONST               0 (1234)

    # In the future:
    LOAD_NAME                0 (int)
    LOAD_CONST               0 ('1234')
    CALL_FUNCTION            1 (1 positional, 0 keyword pair)

If you were asked to vote +1 or -1 on this proposal (sitting on the 
fence not allowed), which would you vote? I would vote -1.

Aside from the performance hit, it's also a semantic change: what was a 
compile-time literal is now a runtime function call which can be 
shadowed. It is nice to know that when I say ``n = 1234`` that the value 
of n is guaranteed to be 1234 no matter what odd things are going on.

(Short of running a modified interpreter.)

String literals (byte- or unicode, raw or cooked, triple- or 
single-quoted) are, with the exception of f-strings, LOAD_CONST calls 
like ints. I think that's a valuable, useful thing to know, and not 
something we should lightly give up.


> Here are things you probably really do care about: (a) they act like 
> strings, (b) they act like constants,

Don't confuse immutability with constant-ness.

Python doesn't have constants, except by convention. There's no easy way 
to prevent a simple name from being rebound.


> (c) if there are potential issues parsing them, you see those issues 
> as soon as possible,

Like at compile-time?

Consider the difference between the compile-time syntax error you get 
here:

    x = 123w456

versus the run-time error you get here:

    x = int("123w456")


I can understand saying "we have no choice but to make this a runtime 
operation", or even "on the balance of pros and cons, it isn't worth the 
extra work to make this happen at compile-time".

I don't like it that we have to write Decimal("123.456"), but I 
understand the reasons why we have to and can accept that it is a 
necessary evil.

(To clarify: of course it is a feature that we *can* pass strings to the 
Decimal constructor, when such strings come from user-input or are read 
from data files etc.)

But I don't think that it is a feature that there is no alternative but 
to pass a string, even when the value is known at edit-time. And I don't 
understand your position that I shouldn't care about the difference.


> (d) working with them is more than fast enough.

You are right that Python is usually "fast enough" (except when it 
isn't), and that the one-off cost of creating a few pseudo-constants is 
generally only a small fraction of the cost of most programs.

But Python isn't quote-unquote "slow" because of any one single thing, 
it is more of a death by a thousand cuts, lots of *small* inefficiences 
which individually don't matter but collectively add up to making Python 
up to a hundred times slower than C.

When performance matters, which would you rather write?

    for item in huge_sequence:
        value = item + 1234
        value = item + int("1234") 

I know that when I use a literal, it will be as fast as it possibly can 
be in Python, or at least there's nothing *I* can do to make it faster. 
But when I have to use a call like Decimal("123.45"), that's one more 
thing for me to have to worry about: is it fast enough? Can I make it 
faster? Should I make it faster?

We should be wary about hiding potentially slow code in something that 
looks like fast code.

(Yes, that's a criticism of properties too, but in the case of 
properties we know that the benefits outweigh the risk. It's not clear 
that this is the case here.)


> Compile time is neither 
> necessary (Haskell) nor sufficient (Tcl) for any of that. So why 
> insist on compile-time instead of insisting on a-d?

I think you will find that I said this should be "a strong preference", 
which is hardly *insisting*.


> > No I'm not. I'm going to think of it as a *string*, because it looks 
> > like a string.
> 
> Well, yes. It’s a path string, or a regex string, or a version string, 

Actually, no, it will be a Path object, a compiled regex SRE_pattern 
object, or a Version object, not a string at all.


> or whatever, which is loosely a kind of string but not literally one. 
> Like bytes.

Bytes literally are strings. They just aren't strings of Unicode 
characters.


> Or it’s a sql cursor, in which case it was probably a misuse of the feature.

That's one of the motivating examples. I agree it is a misuse of the 
proposed feature.


> > Particularly given the OP's preference for single-letter prefixes.
> 
> OK, I will agree with you there that the overuse of single-letter 
> prefixes in the motivating examples is a worrying sign. In principle 
> there’s nothing wrong with single letters (and I think I can make a 
> good case for the f suffix as a good use in 3D-math code).

I can concur with all of that.


[...]
> As I’ve said before, I believe that anything that doesn’t have a 
> builtin type does not deserve builtin syntax.

Agreed. Although there's a bit of fuzziness over the concept of 
"builtin". Not all built-in objects are available in the ``builtins`` 
module, e.g. NoneType, or FunctionType.


> And I don’t understand 
> why that isn’t a near-ubiquitous viewpoint. But it’s not just you; at 
> least three people (all of whom dislike the whole concept of custom 
> affixes) seem at least in principle open to the idea of adding builtin 
> affixes for types that don’t exist. Which makes me think it’s almost 
> certainly not that you’re all crazy, but that I’m missing something 
> important. Can you explain it to me?

I thought it went without saying that a necessary pre-condition for 
adding builtin syntax for a type was for the type to become built-in 
first. Sorry if it wasn't as clear or obvious as I thought.



-- 
Steven
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/WDR2QHG4EBB3FP6Z2T6CGKC7O7D4KDA5/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Custom string prefixes

Reply via email to