On Tue, Mar 29, 2022 at 11:00:41AM +0300, Serhiy Storchaka wrote:
> 28.03.22 15:13, StrikerOmega пише:
> >And I want to grab some kind of value from it.
>
> There is a powerful tool designed for solving such problems. Is is
> called regular expressions.
>
> >sample.grab(start="fruit:", end="\n")
> > >> 'apple'
>
> re.search(r'fruit:(.*?)\n', sample)[1]
Now do grab(start="*", end=".").
Of course you know how to do it, but a naive solution:
re.search(r'*(.*?).', sample)[1]
will fail. So now we have to learn about escaping characters in order to
do a simple find-and-extract. And you need to memorise what characters
have to be escaped, and if your start and end parameters are expressions
or parameters rather than literals, the complexity goes up a lot:
# Untested, so probably wrong.
re.search(re.escape(start) + "(.*?)" + re.escape(end))[1]
and we both know that many people won't bother with the escapes until
they get bitten by bugs in their production code. And even then, regexes
are a leading source of serious software vulnerabilities.
https://cwe.mitre.org/data/definitions/185.html
Yes, regular expressions can be used. We know that regexes can be used
to solve most problems, for some definition of "solve". Including
finding prime numbers:
https://iluxonchik.github.io/regular-expression-check-if-number-is-prime/
A method can raise a useful, self-explanatory error message on failure.
Your regex raises "TypeError: 'NoneType' object is not subscriptable".
A method can be written to parse nested brackets correctly. A regular
expression cannot.
And then regexes are significantly slower:
>>> sample = 'Hello world fruit: apple\n'
>>> setup = "from __main__ import grab, sample; import re"
>>> t_grab = Timer("grab(sample, 'fruit', '\\n')", setup=setup)
>>> t_regex = Timer("re.search(r'fruit:(.*?)\\n', sample)[1]", setup=setup)
>>> min(t_grab.repeat())
0.47571489959955215
>>> min(t_regex.repeat())
0.8434272557497025
Here's the version of grab I used:
def grab(text, start, end):
a = text.index(start)
b = text.index(end, a+len(start))
return text[a+len(start):b]
I have no strong opinion on whether this simple function should be built
into the string class, but I do have a strong opinion about re-writing
it into a slower, more fragile, harder to understand, less user-friendly
regex.
Don't make me quote Jamie Zawinski again.
--
Steve
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/HB34UPEP7B3S5O76KEL3B5GN5TB4ODCJ/
Code of Conduct: http://python.org/psf/codeofconduct/