On Tue, Aug 27, 2019 at 08:22:22AM -0000, [email protected] wrote:
> The string (or number) prefixes add new power to the language
I don't think they do. It's just syntactic sugar for a function call.
There's nothing that czt'...' will do that czt('...') can't already do.
If you have a proposal that allows custom string prefixes to do
something that a function call cannot do, I've missed it.
> If a certain feature can potentially be misused shouldn't deter us
> from adding it, if the benefits are significant.
Very true, but so far I see nothing in this proposal that suggests that
the benefits are more significant than avoiding having to type a pair of
parentheses. Every benefit I have seen applies equally to the function
call version, but without the added complexity to the language of
allowing custom string prefixes.
> And the benefits in terms of readability can be significant.
I don't think they will be. I think they will encourage cryptic
one-character function names disguised as prefixes:
v'...' instead of Version(...)
x'...' instead of re.compile(...)
to take two examples from your proposal. At least this is somewhat
better:
sql'...'
but that leaves the ambiguity of not knowing whether that's a chained
function call s(q(l(...))) or a single sql(...).
I believe it will also encourage inefficient and cryptic string parsing
instead of more clear use of seperate arguments. Your earlier example:
frac'123/4567'
The Fraction constructor already accepts such strings, and it is
occasionally handy for parsing user-input. But using it to parse string
literals gives slow, inefficient code for little or no benefit:
[steve@ando cpython]$ ./python -m timeit -s 'from fractions import
Fraction' 'Fraction(123, 4567)'
20000 loops, best of 5: 18.9 usec per loop
[steve@ando cpython]$ ./python -m timeit -s 'from fractions import
Fraction' 'Fraction("123/4567")'
5000 loops, best of 5: 52.9 usec per loop
Unless you can suggest a way to parse arbitrary strings in arbitrary
ways at compile-time, these custom string prefixes are probably doomed
to be slow and inefficient.
The best thing I can say about this is that at least frac'123/4567'
would probably be easy to understand, since the / syntax for fractions
is familiar to most people from school. But the same cannot be said for
other custom prefixes:
cf'[0; 37, 7, 1, 2, 5]'
Perhaps you can guess the meaning of that cf-string. Perhaps you can't.
A hint might point you in the right direction:
assert cf'[0; 37, 7, 1, 2, 5]' == Fraction(123, 4567)
(By the way, the semi-colon is meaningful and not a typo.)
To the degree that custom string prefixes will encourage cryptic one and
two letter names, I think that this will hurt readability and clarity of
code. But if the reader has the domain knowledge to recognise what "cf"
stands for, this may be no worse than (say) "re" (regular expression).
In conventional code, we might call the cf function like this:
cf([0, 37, 7, 1, 2, 5]) # Single list argument.
cf(0, 37, 7, 1, 2, 5) # *args version.
Either way works for me. But it is your argument that replacing the
parentheses with quote marks is "more readable":
cf([0, 37, 7, 1, 2, 5])
cf'[0; 37, 7, 1, 2, 5]'
not just a little bit more readable, but enough to make up for the
inefficiency of having to write your own parser, deal with errors,
compile a string literal, parse it at runtime, and only then call the
actual cf constructor and return a cf object.
Even if I accepted your claim that swapping (...) for '...' was more
readable, I am skeptical that the additional work and runtime
inefficiency would be worth the supposed benefit.
I don't wish to say that parsing strings to extract information is
always an anti-pattern:
http://cyrille.martraire.com/2010/01/the-string-obsession-anti-pattern/
after all we often need to process data coming from config files or
other user-input, where we have no choice but to accept a string.
But parsing string *literals* usually is an anti-pattern, especially
when there is a trivial transformation from the string to the
constructor arguments, e.g. 123/4567 --> Fraction(123, 4567).
[...]
> Exactly. You look at string "1.10a" and you know it must be a version string,
> because you're a human, you're smart. The compiler is not a human, it has no
> idea. To the Python interpreter it's just a PyUnicode object of length 5. It's
> meaningless. But when you combine this string with a prefix into a single
> object, it gains power. It can have methods or special behaviors. It can have
> a type, different from `str`, that can be inspected when passing this object
> to
> another function.
Everything you say there applies to ordinary function call syntax too:
Version('1.10a')
can have methods, special behaviours, a type different from str, etc.
Not one of those benefits comes from *custom string prefixes*. They all
come from the use of a custom type.
In fact, we can can be more explicit and clear with the constructor:
Version(major=1, minor=10, stage='a')
There is nothing magic about this v-string prefix. You still have to
write a Version class with a version-string parser. The compiler can't
help you, because it has no knowledge of the format of version strings.
All the compiler can do is pass the string '1.10a' to the function v().
[...]
> > for rather insignificant gains, the saving of two parentheses.
>
> Two bytes doesn't sound like a lot. I mean, it is quite little on the grand
> scale
> of things. However, I don't think the simple byte-count is a proper measure
> here. There could be benefits to readability even if it was 0 or negative byte
> difference.
"There could be..." lots of things, but the onus is on you to prove that
there actually *are* such benefits.
> I believe a good way to think about this is the following: if the feature was
> already implemented, would people want to use it, and would it improve
> readability of their code?
I answered that in my previous post.
I would prefer an explicit, clear, self-documenting function call
Version() over a terse, unclear syntax that looks like a string but
isn't. I don't think that v'1.10a' is clearer or more readable than
Version('1.10a'). It is *shorter*, but that's it.
The bottom line is, so long as this proposal is for nothing more than
mere syntactic sugar allowing you to drop the parentheses from certain
function calls (those that take a single string argument), the benefit
is tiny, and the added complexity and opportunity for abuse and
confusion is large.
> As a practical example, consider function `pandas.read_csv()`. The
> documentation
> for its `sep` parameter says "In addition, separators longer than 1 character
> and
> different from ``'\s+'`` will be interpreted as regular expressions ...". In
> this case
> they wanted the `sep` parameter to handle both simple separators, and the
> regular expression separators. However, as there is no syntax to create a
> "regular expression string", they ended up with this dubious heuristic based
> on
> the length of the string...
I can't help pandas' poor API, and I doubt that your proposal would
have prevented it either.
> Ideally, they should have said that `sep` could be either
> a string or a regexp-object, but the barrier to write
>
> from re import compile as rx
> rx('...')
>
> is just impossibly high for a typical user.
Think about what you are saying about the sophisticated data
scientists who are typical pandas users:
- they can write "import pandas"
- but not "import re" or "from re import compile as rx"
- they will be able to import your rx'...' string prefix from
wherever it comes from (perhaps "from re import rx"?)
- and are capable of writing regular expressions using your
custom rx'...' syntax
- but adding parentheses is beyond them: rx('...').
I cannot take this argument about sophisticated regex-users who are
defeated by function call syntax seriously.
--
Steven
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/KOIHHFVDRWNMY3GSU6XE3GNF4SSQVOP6/
Code of Conduct: http://python.org/psf/codeofconduct/