jwaixs wrote: > arg... I've lost 1.5 hours of my precious time to try letting re work > correcty. There's really not a single good re tutorial or documentation > I could found! There are only reference, and if you don't know how a > module work you won't learn it from a reference! > > This is the problem: > > >>>>import re >>>>str = "blabla<python>Re modules sucks!</python>blabla" >>>>re.search("(<python>)(/python>)", str).group() > > Traceback (most recent call last): > File "<stdin>", line 1, in ? > AttributeError: 'NoneType' object has no attribute 'group' > > the only thing I want are the number of places blabla, Re modules > sucks! and blabla are.
Others gave you advice on how to deal withe regexes. I'm going to add that regexes aren't the way to go for this - use HTMLParser. With your regex, you won't be able to handle correctly either this <foo>some text</foo><foo>some other text</foo> as you will get the whole string, not just the first match. You can alter the so-called longest match behaviour, but then <foo>some oute text <foo>some inner text</foo> some more outer text</foo> won't work.... Try and do not use regexps. Or at least do it in a way that you tokenize the text and then can sweep over it collecting the data you need yourself (but that's basically rewriting the html parsers out there). Diez -- http://mail.python.org/mailman/listinfo/python-list