lothar wrote: > give an re to find every innermost "table" element: > > innertabdoc = """ > <table border="0" cellspacing="0" cellpadding="0"> > <tr><td> > <table border="0" cellspacing="0" cellpadding="0"> > <tr><td> <a>n</a> > </td></tr> > </table> > </td></tr> > </table> > <table border="0" cellspacing="0" cellpadding="0"> > <tr><td> > <table border="0" cellspacing="0" cellpadding="0"> > <tr><td> </td> <td> > <table border="0" cellspacing="0" cellpadding="0"> > <tr><td> <p>y</p> <td> z</td> > </td></tr> > </table> > </td></tr> > </table> > </td></tr> > <tr><td> > <table border="0" cellspacing="0" cellpadding="0"> > <tr><td> > </td></tr> > </table> > </td></tr> > </table> > """
REs are Regular Expressions, not parsers. There are problems for which there is no RE solution (I'm not implying that this is the case in your example). In any case, complex text processing should be done using tools better suited to this. In this case, HTMLParser seems like a reasonable choice. mfg Georg -- http://mail.python.org/mailman/listinfo/python-list