On 25 Jun 2008 15:20:04 GMT, Kirk <[EMAIL PROTECTED]> wrote: > Hi All, > the following regular expression matching seems to enter in a infinite > loop: > > ################ > import re > text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) > una ' > re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9] > *[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text) > ################# > > No problem with perl with the same expression: > > ################# > $s = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) una > '; > $s =~ /[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]*[A- > Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$/; > print $1; > ################# > > I've python 2.5.2 on Ubuntu 8.04. > any idea?
If it will help some smarter person identify the problem, it can be simplified to this: re.findall('[^X]*((?:0*X+0*)+\s*a*\s*(?:0*X+0*\s*)*)([^X]*)$', "XXXXXXXXXXXXXXXXX (X" ) This doesn't actually hang, it just takes a long time. The time taken increases quickly as the chain of X's gets longer. HTH -- To email me, substitute nowhere->spamcop, invalid->net. -- http://mail.python.org/mailman/listinfo/python-list