Re: re module non-greedy matches broken

Georg Brandl Tue, 05 Apr 2005 10:10:06 -0700

lothar wrote:
> give an re to find every innermost "table" element:
> 
> innertabdoc = """
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td>
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td> <a>n</a>
>   </td></tr>
> </table>
>   </td></tr>
> </table>
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td>
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td> </td> <td>
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td> <p>y</p> <td> z</td>
>   </td></tr>
> </table>
>   </td></tr>
> </table>
>   </td></tr>
>   <tr><td>
> <table border="0" cellspacing="0" cellpadding="0">
>   <tr><td>
>   </td></tr>
> </table>
>   </td></tr>
> </table>
> """


REs are Regular Expressions, not parsers. There are problems for
which there is no RE solution (I'm not implying that this is the
case in your example).

In any case, complex text processing should be done using tools
better suited to this. In this case, HTMLParser seems like a
reasonable choice.

mfg
Georg
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: re module non-greedy matches broken

Reply via email to