Re: Match HTML
...... string over multiple

Octavian Rasnita Tue, 18 Nov 2014 21:46:39 -0800

Hi,

Don't use regular expressions for matching.


use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new_from_content($html_content);

my $div = $tree->look_down( _tag => 'div', id => 'product', class => 'product' 
);

my $table = $div->look_down( _tag => 'table', class => 'prodc' );

#Here you can get the table components like:

my @tr = $table->look_down( _tag => 'tr' );

for my $tr ( @tr ) {
    my @td = $tr->look_down( _tag => 'td' );
    print $td[0]->as_text;
}

Or you can do many more or do much more complex searching for HTML elements 
using HTML::TreeBuilder.
Read:
perldoc HTML::TreeBuilder
perldoc HTML::Element

--Octavian

  ----- Original Message ----- 
  From: mimic...@gmail.com 
  To: beginners@perl.org 
  Sent: Tuesday, November 18, 2014 10:22 PM
  Subject: Match HTML <div> ...... </dv> string over multiple


  I am trying to extract a table (<table class="xxxx"><tr><td>...... until 
</table>) and its content from an HTML file.


  With the file I have something like this


  <div id="product" class="product">

  <table border="0" cellspacing="0" cellpadding="0" class="prodc" 
title="Product ">

  .
  .
  .
  </table>
  </div>


  There could be more that one table in the file.however I am only interested 
in the table within <div id="product" class="product"> </div>.


  /^.*<div id="product" class="product">.+?(<table 
border="0".+?\s+<\/table>)\s*<\/div>.*$/ims


  The above and various variations I tried do not much.


  I am able to easily match this using sed, however I need to try using perl.


  This sed work just fine:


  sed -n '/<div id="product" class="product">/,/<\/table>/p' thelo826.html |sed 
-n '/<table border.*/,/<\/table>/p'| sed -e 's/class=".*"//g'


  Thanks


  Mimi

Re: Match HTML ...... string over multiple

Reply via email to

Re: Match HTML
...... string over multiple