[Python-ideas] Re: A string function idea

Steven D'Aprano Tue, 29 Mar 2022 16:06:50 -0700

On Tue, Mar 29, 2022 at 11:00:41AM +0300, Serhiy Storchaka wrote:
> 28.03.22 15:13, StrikerOmega пише:
> >And I want to grab some kind of value from it.
> 
> There is a powerful tool designed for solving such problems. Is is 
> called regular expressions.
> 
> >sample.grab(start="fruit:", end="\n")
> > >> 'apple'
> 
> re.search(r'fruit:(.*?)\n', sample)[1]


Now do grab(start="*", end=".").

Of course you know how to do it, but a naive solution:

    re.search(r'*(.*?).', sample)[1]

will fail. So now we have to learn about escaping characters in order to 
do a simple find-and-extract. And you need to memorise what characters 
have to be escaped, and if your start and end parameters are expressions 
or parameters rather than literals, the complexity goes up a lot:

    # Untested, so probably wrong.
    re.search(re.escape(start) + "(.*?)" + re.escape(end))[1]

and we both know that many people won't bother with the escapes until 
they get bitten by bugs in their production code. And even then, regexes 
are a leading source of serious software vulnerabilities.

https://cwe.mitre.org/data/definitions/185.html

Yes, regular expressions can be used. We know that regexes can be used 
to solve most problems, for some definition of "solve". Including 
finding prime numbers:

https://iluxonchik.github.io/regular-expression-check-if-number-is-prime/

A method can raise a useful, self-explanatory error message on failure. 
Your regex raises "TypeError: 'NoneType' object is not subscriptable".

A method can be written to parse nested brackets correctly. A regular 
expression cannot.

And then regexes are significantly slower:

>>> sample = 'Hello world fruit: apple\n'
>>> setup = "from __main__ import grab, sample; import re"
>>> t_grab = Timer("grab(sample, 'fruit', '\\n')", setup=setup)
>>> t_regex = Timer("re.search(r'fruit:(.*?)\\n', sample)[1]", setup=setup)
>>> min(t_grab.repeat())
0.47571489959955215
>>> min(t_regex.repeat())
0.8434272557497025

Here's the version of grab I used:

def grab(text, start, end):
    a = text.index(start)
    b = text.index(end, a+len(start))
    return text[a+len(start):b]

I have no strong opinion on whether this simple function should be built 
into the string class, but I do have a strong opinion about re-writing 
it into a slower, more fragile, harder to understand, less user-friendly 
regex.

Don't make me quote Jamie Zawinski again.


-- 
Steve
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/HB34UPEP7B3S5O76KEL3B5GN5TB4ODCJ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: A string function idea

Reply via email to