[Python-ideas] Re: Custom string prefixes

Steven D'Aprano Tue, 27 Aug 2019 08:38:18 -0700

On Tue, Aug 27, 2019 at 08:22:22AM -0000, [email protected] wrote:

> The string (or number) prefixes add new power to the language


I don't think they do. It's just syntactic sugar for a function call. 
There's nothing that czt'...' will do that czt('...') can't already do.

If you have a proposal that allows custom string prefixes to do 
something that a function call cannot do, I've missed it.


> If a certain feature can potentially be misused shouldn't deter us
> from adding it, if the benefits are significant.

Very true, but so far I see nothing in this proposal that suggests that 
the benefits are more significant than avoiding having to type a pair of 
parentheses. Every benefit I have seen applies equally to the function 
call version, but without the added complexity to the language of 
allowing custom string prefixes.


> And the benefits in terms of readability can be significant.

I don't think they will be. I think they will encourage cryptic 
one-character function names disguised as prefixes:

    v'...' instead of Version(...)
    x'...' instead of re.compile(...)

to take two examples from your proposal. At least this is somewhat 
better:

    sql'...'

but that leaves the ambiguity of not knowing whether that's a chained 
function call s(q(l(...))) or a single sql(...).

I believe it will also encourage inefficient and cryptic string parsing 
instead of more clear use of seperate arguments. Your earlier example:

    frac'123/4567'

The Fraction constructor already accepts such strings, and it is 
occasionally handy for parsing user-input. But using it to parse string 
literals gives slow, inefficient code for little or no benefit:

[steve@ando cpython]$ ./python -m timeit -s 'from fractions import 
    Fraction' 'Fraction(123, 4567)'
20000 loops, best of 5: 18.9 usec per loop

[steve@ando cpython]$ ./python -m timeit -s 'from fractions import 
    Fraction' 'Fraction("123/4567")'
5000 loops, best of 5: 52.9 usec per loop


Unless you can suggest a way to parse arbitrary strings in arbitrary 
ways at compile-time, these custom string prefixes are probably doomed 
to be slow and inefficient.

The best thing I can say about this is that at least frac'123/4567' 
would probably be easy to understand, since the / syntax for fractions 
is familiar to most people from school. But the same cannot be said for 
other custom prefixes:

    cf'[0; 37, 7, 1, 2, 5]'

Perhaps you can guess the meaning of that cf-string. Perhaps you can't. 
A hint might point you in the right direction:

    assert cf'[0; 37, 7, 1, 2, 5]' == Fraction(123, 4567)

(By the way, the semi-colon is meaningful and not a typo.)

To the degree that custom string prefixes will encourage cryptic one and 
two letter names, I think that this will hurt readability and clarity of 
code. But if the reader has the domain knowledge to recognise what "cf" 
stands for, this may be no worse than (say) "re" (regular expression).

In conventional code, we might call the cf function like this:

    cf([0, 37, 7, 1, 2, 5])  # Single list argument.
    cf(0, 37, 7, 1, 2, 5)    # *args version.

Either way works for me. But it is your argument that replacing the 
parentheses with quote marks is "more readable":

    cf([0, 37, 7, 1, 2, 5])
    cf'[0; 37, 7, 1, 2, 5]'

not just a little bit more readable, but enough to make up for the 
inefficiency of having to write your own parser, deal with errors, 
compile a string literal, parse it at runtime, and only then call the 
actual cf constructor and return a cf object.

Even if I accepted your claim that swapping (...) for '...' was more 
readable, I am skeptical that the additional work and runtime 
inefficiency would be worth the supposed benefit.


I don't wish to say that parsing strings to extract information is 
always an anti-pattern:

http://cyrille.martraire.com/2010/01/the-string-obsession-anti-pattern/

after all we often need to process data coming from config files or 
other user-input, where we have no choice but to accept a string.

But parsing string *literals* usually is an anti-pattern, especially 
when there is a trivial transformation from the string to the 
constructor arguments, e.g. 123/4567 --> Fraction(123, 4567).


[...]
> Exactly. You look at string "1.10a" and you know it must be a version string,
> because you're a human, you're smart. The compiler is not a human, it has no
> idea. To the Python interpreter it's just a PyUnicode object of length 5. It's
> meaningless. But when you combine this string with a prefix into a single
> object, it gains power. It can have methods or special behaviors. It can have
> a type, different from `str`, that can be inspected when passing this object 
> to
> another function.

Everything you say there applies to ordinary function call syntax too:

    Version('1.10a')

can have methods, special behaviours, a type different from str, etc. 
Not one of those benefits comes from *custom string prefixes*. They all 
come from the use of a custom type.

In fact, we can can be more explicit and clear with the constructor:

    Version(major=1, minor=10, stage='a')


There is nothing magic about this v-string prefix. You still have to 
write a Version class with a version-string parser. The compiler can't 
help you, because it has no knowledge of the format of version strings. 
All the compiler can do is pass the string '1.10a' to the function v().


[...]
> > for rather insignificant gains, the saving of two parentheses. 
> 
> Two bytes doesn't sound like a lot. I mean, it is quite little on the grand 
> scale
> of things. However, I don't think the simple byte-count is a proper measure
> here. There could be benefits to readability even if it was 0 or negative byte
> difference.

"There could be..." lots of things, but the onus is on you to prove that 
there actually *are* such benefits.

 
> I believe a good way to think about this is the following: if the feature was 
> already implemented, would people want to use it, and would it improve
> readability of their code?

I answered that in my previous post.

I would prefer an explicit, clear, self-documenting function call 
Version() over a terse, unclear syntax that looks like a string but 
isn't. I don't think that v'1.10a' is clearer or more readable than 
Version('1.10a'). It is *shorter*, but that's it.

The bottom line is, so long as this proposal is for nothing more than 
mere syntactic sugar allowing you to drop the parentheses from certain 
function calls (those that take a single string argument), the benefit 
is tiny, and the added complexity and opportunity for abuse and 
confusion is large.


> As a practical example, consider function `pandas.read_csv()`. The 
> documentation
> for its `sep` parameter says "In addition, separators longer than 1 character 
> and
> different from ``'\s+'`` will be interpreted as regular expressions ...". In 
> this case
> they wanted the `sep` parameter to handle both simple separators, and the
> regular expression separators. However, as there is no syntax to create a 
> "regular expression string", they ended up with this dubious heuristic based 
> on
> the length of the string... 

I can't help pandas' poor API, and I doubt that your proposal would 
have prevented it either.



> Ideally, they should have said that `sep` could be either
> a string or a regexp-object, but the barrier to write 
> 
>     from re import compile as rx
>     rx('...')
> 
> is just impossibly high for a typical user.

Think about what you are saying about the sophisticated data 
scientists who are typical pandas users:

- they can write "import pandas"

- but not "import re" or "from re import compile as rx"

- they will be able to import your rx'...' string prefix from
  wherever it comes from (perhaps "from re import rx"?)

- and are capable of writing regular expressions using your 
  custom rx'...' syntax

- but adding parentheses is beyond them: rx('...').

I cannot take this argument about sophisticated regex-users who are 
defeated by function call syntax seriously.




-- 
Steven
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/KOIHHFVDRWNMY3GSU6XE3GNF4SSQVOP6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Custom string prefixes

Reply via email to