Re: Match HTML
...... string over multiple

Charles DeRykus Tue, 18 Nov 2014 22:31:38 -0800

On Tue, Nov 18, 2014 at 12:22 PM, mimic...@gmail.com <mimic...@gmail.com> wrote:
> I am trying to extract a table (<table class="xxxx"><tr><td>...... until
> </table>) and its content from an HTML file.
>
> With the file I have something like this
>
> <div id="product" class="product">
> <table border="0" cellspacing="0" cellpadding="0" class="prodc"
> title="Product ">
> .
> .
> .
> </table>
> </div>
>
> There could be more that one table in the file.however I am only interested
> in the table within <div id="product" class="product"> </div>.
>
> /^.*<div id="product" class="product">.+?(<table
> border="0".+?\s+<\/table>)\s*<\/div>.*$/ims
>
> The above and various variations I tried do not much.
>
> I am able to easily match this using sed, however I need to try using perl.
>
> This sed work just fine:
>
> sed -n '/<div id="product" class="product">/,/<\/table>/p' thelo826.html
> |sed -n '/<table border.*/,/<\/table>/p'| sed -e 's/class=".*"//g'
>


If you're positive the html is consistently formatted,
(machine-generated for instance and you're the generator), you could
do something along this line:

my $regex = qr{ .*? <div .*?   id="product" .*? class="product" .*? >
                        .*? ( <table                     .*?  border="0"
                        .*?  </table> )    .*? </div>
                      }six;

{ local($/);
    my $content = <DATA>;         # substitute your  lexical filehandle
    while ( $content =~ /$regex/g) {
       print "table=$1";
    }
}

-- 
Charles DeRykus

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Match HTML ...... string over multiple

Reply via email to

Re: Match HTML
...... string over multiple