On Sat, 24 Jan 2004, Marcelo wrote: > Which regular expression would you use to remove the <title> and > </title> from a line like this one: > > <title>Here goes a webpage's title</title> > > Thanks a lot in advance. >
Did you what that _exact_ input? I.e. always <title>...</title>? If so, that's rather easy. $line =~ s/<title>(.*)<\/title>/$1/ Now, if you want the more general form of <any_tag>...</any_tag>, that is removing paired HTML tags, that's more difficult. Luckily, it is an example in "Programming PERL, 3rd Edition" on page 184 which is close. line =~ s/(<.*?>)(.*?)(?:</\1>)/$2/ In sort-of English. This says: Match starting with a < and ending with the next >, calling it $1 (or \1). Now, match everything up to the next < and call it $2. Now match a < followed by a /, followed by what you matched first (in $1 or \1), followed by a >. Now, replace all of that with $2. A problem with this pattern is that it would not work as you would like want it to with input such as: <title><B>Title</B></title> You'd end up removing the <B> and </B>, but leaving the <title> and </title>. Of course, if your desire is to remove all paired HTML tags, then put this in a loop until it no longer matches. HTH, -- Maranatha! John McKown -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>