> Or maybe I misunderstood the question Or maybe I did :)
> HTML::TokeParser::Simple I agree... but only if you are looking for a strong permanant solution. The regex way is good for quick and dirty HTML work. Sara, if you need to keep the <head> tags, then you could use this modified version... # untested $text = "..."; $text =~ s|(<head>).*?(</head>)|$1$2|s; ...Or if you wanted to keep the <title> tag... # untested $text = "..."; $text =~ s|(<head>).*?<title>.*?</title>.*?(</head>)|$1$2$3|s; Rob -----Original Message----- From: Wiggins d'Anconia [mailto:[EMAIL PROTECTED] Sent: Thursday, September 04, 2003 8:48 PM To: 'Sara' Cc: beginperl Subject: Re: Stripping HTML from a text file. Won't this remove *everything* between the given tags? Or maybe I misunderstood the question, I thought she wanted to remove the "code" from all of the contents between two tags? Because of the complexity and variety of HTML code, the number of different tags, etc. I would suggest using an HTML parsing module for this task. HTML::TokeParser::Simple has worked very well for me in the past. There are a number of examples available. If this is what you want and you get stuck on the module then come back with questions. There are also the base modules such as HTML::Parser, etc. that the one previously mentioned builds on, among others check CPAN. http://danconia.org Hanson, Rob wrote: > A simple regex will do the trick... > > # untested > $text = "..."; > $text =~ s|<head>.*?</head>||s; > > Or something more generic... > > # untested > $tag = "head"; > $text =~ s|<$tag[^>]*?>.*?</$tag>||s; > > This second one also allows for possible attributes in the start tag. You > may need more than this if the HTML isn't well formed, or if there are extra > spaces in your tags. > > If you want something for the command line you could do this... > > (Note: for *nix, needs modification for Win [untested]) > perl -e '$x=join("",<>);$x=~s|<head>.*?</head>||s' myfile.html > > newfile.html > > Rob > > > -----Original Message----- > From: Sara [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 03, 2003 6:32 AM > To: beginperl > Subject: Stripping HTML from a text file. > > > I have a couple of text files with html code in them.. e.g. > > ---------- Text File -------------- > <html> > <head> > <title>This is Test File</title> > </head> > <body> > <font size=2 face=arial>This is the test file contents<br> > <p> > blah blah blah......... > </body> > </html> > > ----------------------------------------- > > What I want to do is to remove/delete HTML code from the text file from a > certain tag upto certain tag. > > For example; I want to delete the code completely that comes in between > <head> and </head> (including any style tags and embedded javascripts etc) > > Any ideas? > > Thanks in advance. > > Sara. > -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]