On Jun 26, 1:20 am, Kirk <[EMAIL PROTECTED]> wrote: > Hi All, > the following regular expression matching seems to enter in a infinite > loop: > > ################ > import re > text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) > una ' > re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9] > *[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text) > ################# > [expletives deleted] > > I've python 2.5.2 on Ubuntu 8.04. > any idea?
Several problems: (1) lose the vertical bars (as advised by others) (2) ALWAYS use a raw string for regexes; your \s* will match on lower- case 's', not on spaces (3) why are you using findall on a pattern that ends in "$"? (4) using non-verbose regexes of that length means you haven't got a petrol drum's hope in hell of understanding what's going on (5) too many variable-length patterns, will take a finite (but very long) time to evaluate (6) as remarked by others, you haven't said what you are trying to do; what it actually is doing doesn't look sensible (see below). Following code is after fixing problems 1,2,3,4: C:\junk>type infinitere.py import re text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) una ' regex0 = r""" [^A-Z0-9]* # match leading space ( (?: [0-9]* # match nothing [A-Z]+ # match "MSX" [0-9a-z\-]* # match nothing )+ # match "MSX" \s* # match " " [a-z]* # match nothing \s* # match nothing (?: [0-9]* [A-Z]+ [0-9a-z\-]* \s* )* # match "INTERNATIONAL HOLDINGS ITALIA " ) ([^A-Z]*) # match "srl (di sequito " """ regex1 = regex0 + "$" for rxno, rx in enumerate([regex0, regex1]): mobj = re.compile(rx, re.VERBOSE).match(text) if mobj: print rxno, mobj.groups() else: print rxno, "failed" C:\junk>infinitere.py 0 ('MSX INTERNATIONAL HOLDINGS ITALIA ', 'srl (di seguito ') ### taking a long time, interrupted HTH, John -- http://mail.python.org/mailman/listinfo/python-list