Using the excellent example in the an earlier post from david:
RE: Removing HTML Tags

I came up with this slightly modified version based on the post and some cpan 
documentation and it works. 
It just brought up a few more questions.
Basically I'm just trying to grab the body contents without comments or script stuff.

So far this module is really cool and handy!!

#!/usr/bin/perl

use HTML::Parser;

my $text = <<HTML;

<html><head>
<title> HI Title </title>
heaD STUFF
</head>
<body bodytag=attributes>
hI HERE'S CONTENT i WANT
<!-- i WANT TO STRIP COMMENTS OUT -->
<SCRIPT>

i DON'T WANT THIS SCRIPT EITHER

</SCRIPT>

</BODY>
</HTMl>

HTML

my $html = HTML::Parser->new(
                api_version => 3,
                text_h      => [sub{ print shift;}, 'dtext'],
                start_h     => [sub{ print shift;}, 'text'],
                end_h       => [sub{ print shift;}, 'text']);

#Q) Before I kill the head section or body tags below how do I grab these parts of it?
#       1 - my $title = ???? IE the text between title tags
#       2 - get body tag attributes my $body_attributes = ???? IE in this example it'd 
be 'bodytag=attributes'

$html->ignore_elements(qw(head script));
$html->ignore_tags(qw(html body));

$html->parse($text);
$html->eof;

####

It automatically prints the modified version of $text without any print statement.
Q) Why is that? 
Q) How can I save the new version of $text to a new variable instead of automatically 
printing it to the screen? 
        ( so I can remove empty lines and have my way with it )
Q) I wanted any comments removed too but I didn't do anything special to it and they 
are gone anyway, are comments removed automatically then?

OUTPUT ::
(dmuey@q42(~):21)$ ./html.pl 



hI HERE'S CONTENT i WANT






(dmuey@q42(~):22)$ 


Thanks

Dan

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to