michael.gauthier....@gmail.com wrote: > Hi MRAB, > > Thanks for taking time to look at my problem! > > I tried your solution: > > r"\d{2}\s?(?=(?:years old\s?|yo\s?|yr old\s?|y o\s?|yrs old\s?|year > old\s?)(?!son|daughter|kid|child))" > > but unfortunately it does seem not work. Also, I tried adding the negative > lookaheads after every one of the alternatives, but it does not work > either, so the problem does not seem to be that the negative lookahead > applies only to the last proposition... : ( > > Also, \d{2} will only match two single digits, and won't match the last > two digits of 101, so at least this is fine! : ) > > Any other idea to improve that code? I'm starting to get desperate...
If your code becomes too complex to manage it break it into simpler parts. In this case you can use two simple regular expressions: >>> age = re.compile(r"\d+") >>> child = re.compile(r"\s+kid") >>> text = "42 bar baz foo 12 kid" >>> for candidate in age.finditer(text): ... if child.match(text, candidate.end()): ... print("Kid's age:", candidate.group()) ... else: ... print("Author's age:", candidate.group()) ... Author's age: 42 Kid's age: 12 Applying that idea (and the principle to break everything into dead easy parts) to your problem: $ cat demo.py import re def longest_first(text): return sorted(text.splitlines(), key=len, reverse=True) YEARS = longest_first("""\ year years year old years old yo ys o """) CHILDREN = longest_first("""\ son daughter kid child """) YEARS_RE = r"\b(?P<age>\d+) ({})".format("|".join(YEARS)) re_years = re.compile(YEARS_RE) CHILD_RE = r" ({})\b".format("|".join(CHILDREN)) re_child = re.compile(CHILD_RE) def followed_by_child(candidate): return re_child.match(candidate.string, candidate.end()) CORPUS = """\ jester, 42 years old, 20 years kidding 12 years kid engineer, 30 years engineer, 30 years old daughter """.splitlines() for text in CORPUS: print(text) for m in re_years.finditer(text): age = m.group("age") if followed_by_child(m): print(" rejected:", age) else: print(" accepted:", age) $ python3 demo.py jester, 42 years old, 20 years kidding accepted: 42 accepted: 20 12 years kid rejected: 12 engineer, 30 years accepted: 30 engineer, 30 years old daughter rejected: 30 -- https://mail.python.org/mailman/listinfo/python-list