Re: Rather complex regular expression for the preg_match_all function

david Thu, 23 Jan 2003 11:46:02 -0800

Andreas Sheriff wrote:

> 
> I don't want to find <p> tags with a complete structure.
> ex: <p>This is a &lt;p&gt; tag with a complete structure</p>
> 
> Instead, I want to find the <p> tag with no closing tag, up to the next
> <p> tag or a closing tag of any type that doesn't have an opening tag
> after the initial <p> found (not including this orphaned closing tag)
>


a reg. exp. is probably not worth the time. have you try HTML::Parser yet? 
for example:

#!/usr/bin/perl -w
use strict;

use HTML::Parser;

my $text = <<HTML;
<html><head><title>Test HTML</title></head>
<body>
<p><font>This tag is ok<br></font></p>
<font><p>I want to find this p tag<br>
and up to the next opening p tag<br>
<b>test<br></b></font>
<p>I want to find this one too
</body>
HTML

my $p_tag = 0;
my @buff = ();

my $html = HTML::Parser->new(api_version => 3,
                                text_h  => [\&text,'dtext'],
                                start_h => [\&open_tag, 'tagname'],
                                end_h   => [\&close_tag,'tagname']);
$html->parse($text);
$html->eof;

print @buff if(@buff);

sub text{

        my $text = shift;

        #-- reg. just for fun :-)
        push(@buff,"<p>$text") if($p_tag && $text =~ /\w/);
}

sub open_tag{

        return unless(shift eq 'p');

        if($p_tag){
                print @buff;
                @buff = ();
        }

        $p_tag = 1;
}

sub close_tag{
        @buff = () and $p_tag = 0 if(shift eq 'p' && $p_tag);
}

__END__

prints:

<p>I want to find this p tag<p>
and up to the next opening p tag<p>test<p>I want to find this one too

second time i have recommand HTML::Parser in a day :-)

david

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Rather complex regular expression for the preg_match_all function

Reply via email to