Le 06/08/2021 à 02:57, Jach Feng a écrit :
ast 在 2021年8月5日 星期四下午11:29:15 [UTC+8] 的信中寫道:
Le 05/08/2021 à 17:11, ast a écrit :
Le 05/08/2021 à 11:40, Jach Feng a écrit :
import regex
# regex is more powerful that re
text = 'ch 1. is\nch 23. is\nch 4 is\nch 56 is\n'
regex.findall(r'ch \d++(?!\.)', text)
['ch 4', 'ch 56']
## ++ means "possessive", no backtrack is allowed
Can someone explain how the difference appear? I just can't figure it out:-(
+, *, ? are greedy, means they try to catch as many characters
as possible. But if the whole match doesn't work, they release
some characters once at a time and try the whole match again.
That's backtrack.
With ++, backtrack is not allowed. This works with module regex
and it is not implemented in module re
with string = "ch 23." and pattern = r"ch \d+\."
At first trial \d+ catch 23
but whole match will fail because next character is . and . is not
allowed (\.)
A backtrack happens:
\d+ catch only 2
and the whole match is successful because the next char 3 is not .
But this is not what we want.
with ++, no backtrack, so no match
"ch 23." is rejected
this is what we wanted
Using re only, the best way is probably
re.findall(r"ch \d+(?![.0-9])", text)
['ch 4', 'ch 56']
--
https://mail.python.org/mailman/listinfo/python-list