On 2017-02-26 17:15, michael.gauthier....@gmail.com wrote:
Hi MRAB,

Thanks for taking time to look at my problem!

I tried your solution:

r"\d{2}\s?(?=(?:years old\s?|yo\s?|yr old\s?|y o\s?|yrs  old\s?|year
old\s?)(?!son|daughter|kid|child))"

but unfortunately it does seem not work. Also, I tried adding the negative 
lookaheads after every one of the alternatives, but it does not work either, so 
the problem does not seem to be that the negative lookahead applies only to
the last proposition... : (

Also, \d{2} will only match two single digits, and won't match the last two 
digits of 101, so at least this is fine! : )

Any other idea to improve that code? I'm starting to get desperate...

Thanks again for your help anyways, I really appreciate it! ; )

Ah, OK. I see what the problem is. (I should've marked it as "untested". :-()

It matches r"yo\s?" against "yo " (the r"\s?" consumes the space) and then the "son" alternative against "son", but that's a _negative_ lookahead, so it _fails_, so it backtracks.

It retries the r"\s?", which now matches an empty string (doesn't consume the space), and then the "son" alternative against " son", which fails, but that's a _negative_ lookahead, so it _succeeds_.

And the regex as a whole matches.

Ideally I'd want to use a possessive quantifier or atomic group, but they aren't supported by the re module, so the workaround is to move the check for whitespace:

r"\d{2}\s?(?=(?:years old|yo|yr old|y o|yrs old|year old)(?!\s?son|\s?daughter|\s?kid|\s?child))"

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to