Dan Muey wrote:

> 
> #Q) Before I kill the head section or body tags below how do I grab these
> #parts of it? 1 - my $title = ???? IE the text between title tags
> #2 - get body tag attributes my $body_attributes = ???? IE in this example
> #it'd be 'bodytag=attributes'
> 

grabs the title and body text and attributes:

#!/usr/bin/perl -w
use strict;

use HTML::Parser;

my $text = <<HTML;
<html><head>
<title> HI Title </title>
heaD STUFF
</head>
<body bodytag=attributes>
hI HERE'S CONTENT i WANT
<!-- i WANT TO STRIP COMMENTS OUT -->
<SCRIPT>

i DON'T WANT THIS SCRIPT EITHER

</SCRIPT>

</BODY>
</HTMl>
HTML

my $body = 0;
my $title = 0;
my @body;
my @title;

my $html = HTML::Parser->new(api_version => 3,
                                text_h => [\&text,'dtext'],
                                start_h => [\&open_tag, 'tagname,attr'],
                                end_h   => [\&close_tag, 'tagname']);
$html->ignore_elements(qw(script));
$html->parse($text);
$html->eof;

print "TITLE @title\n";
print "BODY @body\n";

sub text{

        my $text = shift;

        return unless($text =~ /\w/);

        if($title){
                push(@title,$text);
        }elsif($body){
                push(@body,$text);
        }
}

sub open_tag{

        my $tagname = shift;
        my $attr    = shift;

        $title = 1 if($tagname eq 'title');

        $body = 1,push(@body,join('=',%{$attr})) 
              if($tagname eq 'body');
}

sub close_tag{

        my $tagname = shift;

        $title = 0 if($tagname eq 'title');
        $body  = 0 if($tagname eq 'body');
}

__END__

prints:

TITLE  HI Title
BODY bodytag=attributes
hI HERE'S CONTENT i WANT

there are many ways of doing the same thing.

> 
> It automatically prints the modified version of $text without any print
> statement. Q) Why is that?

no. it doesn't print it automatically. i have print statment for this to 
print out.

> Q) How can I save the new version of $text to a new variable instead of
> automatically printing it to the screen? ( so I can remove empty lines and
> have my way with it ) Q) I wanted any comments removed too but I didn't do
> anything special to it and they are gone anyway, are comments removed
> automatically then?

just remove the print statment and store it as you want.
comments are not removed by default, i don't think.

david

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to