Using the excellent example in the an earlier post from david: RE: Removing HTML Tags
I came up with this slightly modified version based on the post and some cpan documentation and it works. It just brought up a few more questions. Basically I'm just trying to grab the body contents without comments or script stuff. So far this module is really cool and handy!! #!/usr/bin/perl use HTML::Parser; my $text = <<HTML; <html><head> <title> HI Title </title> heaD STUFF </head> <body bodytag=attributes> hI HERE'S CONTENT i WANT <!-- i WANT TO STRIP COMMENTS OUT --> <SCRIPT> i DON'T WANT THIS SCRIPT EITHER </SCRIPT> </BODY> </HTMl> HTML my $html = HTML::Parser->new( api_version => 3, text_h => [sub{ print shift;}, 'dtext'], start_h => [sub{ print shift;}, 'text'], end_h => [sub{ print shift;}, 'text']); #Q) Before I kill the head section or body tags below how do I grab these parts of it? # 1 - my $title = ???? IE the text between title tags # 2 - get body tag attributes my $body_attributes = ???? IE in this example it'd be 'bodytag=attributes' $html->ignore_elements(qw(head script)); $html->ignore_tags(qw(html body)); $html->parse($text); $html->eof; #### It automatically prints the modified version of $text without any print statement. Q) Why is that? Q) How can I save the new version of $text to a new variable instead of automatically printing it to the screen? ( so I can remove empty lines and have my way with it ) Q) I wanted any comments removed too but I didn't do anything special to it and they are gone anyway, are comments removed automatically then? OUTPUT :: (dmuey@q42(~):21)$ ./html.pl hI HERE'S CONTENT i WANT (dmuey@q42(~):22)$ Thanks Dan -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]