In article <[EMAIL PROTECTED]>,
 Cédric Lucantis <[EMAIL PROTECTED]> wrote:

> Le Thursday 26 June 2008 15:53:06 oyster, vous avez écrit :
> > that is, there is no TABLE tag between a TABLE, for example
> > <table >something with out table tag</table>
> > what is the RE pattern? thanks
> >
> > the following is not right
> > <table.*?>[^table]*?</table>
> 
> The construct [abc] does not match a whole word but only one char, so  
> [^table] means "any char which is not t, a, b, l or e".
> 
> Anyway the inside table word won't match your pattern, as there are '<' 
> and '>' in it, and these chars have to be escaped when used as simple text.
> So this should work:
> 
> re.compile(r'<table(|[ ].*)>.*</table>')
>                     ^ this is to avoid matching a tag name starting with 
>                     table 
> (like <table_ext>)

Doesn't work - for example it matches '<table></table><table></table>'
(and in fact if the html contains any number of tables it's going
to match the string starting at the start of the first table and
ending at the end of the last one.)

-- 
David C. Ullrich
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to