Le Wednesday 25 June 2008 18:40:08 cirfu, vous avez écrit : > On 25 Juni, 17:20, Kirk <[EMAIL PROTECTED]> wrote: > > Hi All, > > the following regular expression matching seems to enter in a infinite > > loop: > > > > ################ > > import re > > text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) > > una ' > > re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9 > >] *[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text) > > ################# > > > > No problem with perl with the same expression: > > > > ################# > > $s = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) una > > '; > > $s =~ /[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]*[A- > > Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$/; > > print $1; > > ################# > > > > I've python 2.5.2 on Ubuntu 8.04. > > any idea? > > Thanks! > > > > -- > > Kirk > > what are you trying to do?
This is indeed the good question. Whatever the implementation/language is, something like that can work with happiness, but I doubt you'll find one to tell you if it *should* work or if it shouldn't, my brain-embedded parser is doing some infinite loop too... That said, "[0-9|a-z|\-]" is by itself strange, pipe (|) between square brackets is the character '|', so there is no reason for it to appears twice. Very complicated regexps are always evil, and a two or three stage filtering is likely to do the job with good, or at least better, readability. But once more, what are you trying to do ? This is not even clear that regexp matching is the best tool for it. -- _____________ Maric Michaud -- http://mail.python.org/mailman/listinfo/python-list