On Wednesday, Sep 3, 2003, at 03:32 US/Pacific, Sara wrote: [..]
What I want to do is to remove/delete HTML code from the text file from a certain tag upto certain tag.
For example; I want to delete the code completely that comes in between <head> and </head> (including any style tags and embedded javascripts etc)
Any ideas?
I would recommend that you look into HTML-Tree <http://search.cpan.org/author/SBURKE/HTML-Tree-3.17/> since I have found it a lovely way to do most anything that you would want to know how to do about deconstructing the tree structure of an HTML document.
I'm not too sure you really want to blitz EVERYTHING in the 'head' section...
but you might try say:
while ( my $line = <INFO1> ) { if ( $line =~ /<head>/ ) { # # remove everything after it. print if we have something # $line =~ s/<head>.*//; print $line unless ($line =~ /^\s*$/); # # spin until we see the closing tag - assumes well formedness # do { $line = <INFO1>; } until ( $line =~ /<\/head>/ ) ; # # strip everything before the closing tag # $line =~ s/.*<\/head>//; next if ($line =~ /^\s*$/); # get new line if blank. } print $line ; } but this assumes that the start and stop tags do not have something else on the same line with them - eg
</head><body text="#000000" bgcolor="#FFFFFF">.... ciao drieux
---
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]