On Tue, Nov 18, 2014 at 12:22 PM, mimic...@gmail.com <mimic...@gmail.com> wrote: > I am trying to extract a table (<table class="xxxx"><tr><td>...... until > </table>) and its content from an HTML file. > > With the file I have something like this > > <div id="product" class="product"> > <table border="0" cellspacing="0" cellpadding="0" class="prodc" > title="Product "> > . > . > . > </table> > </div> > > There could be more that one table in the file.however I am only interested > in the table within <div id="product" class="product"> </div>. > > /^.*<div id="product" class="product">.+?(<table > border="0".+?\s+<\/table>)\s*<\/div>.*$/ims > > The above and various variations I tried do not much. > > I am able to easily match this using sed, however I need to try using perl. > > This sed work just fine: > > sed -n '/<div id="product" class="product">/,/<\/table>/p' thelo826.html > |sed -n '/<table border.*/,/<\/table>/p'| sed -e 's/class=".*"//g' >
If you're positive the html is consistently formatted, (machine-generated for instance and you're the generator), you could do something along this line: my $regex = qr{ .*? <div .*? id="product" .*? class="product" .*? > .*? ( <table .*? border="0" .*? </table> ) .*? </div> }six; { local($/); my $content = <DATA>; # substitute your lexical filehandle while ( $content =~ /$regex/g) { print "table=$1"; } } -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/