Dan Muey wrote: > > #Q) Before I kill the head section or body tags below how do I grab these > #parts of it? 1 - my $title = ???? IE the text between title tags > #2 - get body tag attributes my $body_attributes = ???? IE in this example > #it'd be 'bodytag=attributes' >
grabs the title and body text and attributes: #!/usr/bin/perl -w use strict; use HTML::Parser; my $text = <<HTML; <html><head> <title> HI Title </title> heaD STUFF </head> <body bodytag=attributes> hI HERE'S CONTENT i WANT <!-- i WANT TO STRIP COMMENTS OUT --> <SCRIPT> i DON'T WANT THIS SCRIPT EITHER </SCRIPT> </BODY> </HTMl> HTML my $body = 0; my $title = 0; my @body; my @title; my $html = HTML::Parser->new(api_version => 3, text_h => [\&text,'dtext'], start_h => [\&open_tag, 'tagname,attr'], end_h => [\&close_tag, 'tagname']); $html->ignore_elements(qw(script)); $html->parse($text); $html->eof; print "TITLE @title\n"; print "BODY @body\n"; sub text{ my $text = shift; return unless($text =~ /\w/); if($title){ push(@title,$text); }elsif($body){ push(@body,$text); } } sub open_tag{ my $tagname = shift; my $attr = shift; $title = 1 if($tagname eq 'title'); $body = 1,push(@body,join('=',%{$attr})) if($tagname eq 'body'); } sub close_tag{ my $tagname = shift; $title = 0 if($tagname eq 'title'); $body = 0 if($tagname eq 'body'); } __END__ prints: TITLE HI Title BODY bodytag=attributes hI HERE'S CONTENT i WANT there are many ways of doing the same thing. > > It automatically prints the modified version of $text without any print > statement. Q) Why is that? no. it doesn't print it automatically. i have print statment for this to print out. > Q) How can I save the new version of $text to a new variable instead of > automatically printing it to the screen? ( so I can remove empty lines and > have my way with it ) Q) I wanted any comments removed too but I didn't do > anything special to it and they are gone anyway, are comments removed > automatically then? just remove the print statment and store it as you want. comments are not removed by default, i don't think. david -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]