lothar wrote:
> give an re to find every innermost "table" element:
> 
> innertabdoc = """
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td>
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td> <a>n</a>
>   </td></tr>
> </table>
>   </td></tr>
> </table>
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td>
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td> </td> <td>
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td> <p>y</p> <td> z</td>
>   </td></tr>
> </table>
>   </td></tr>
> </table>
>   </td></tr>
>   <tr><td>
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td>
>   </td></tr>
> </table>
>   </td></tr>
> </table>
> """

REs are Regular Expressions, not parsers. There are problems for
which there is no RE solution (I'm not implying that this is the
case in your example).

In any case, complex text processing should be done using tools
better suited to this. In this case, HTMLParser seems like a
reasonable choice.

mfg
Georg
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to