> I have strings like the following one: > my $s="The <b>L</b>ibrary<b> of <font color..."; > > I want to truncate the string, to become > "The <b>L</b>ibrary<b> of ..." > (that is remove 'unterminated' html tags - tags that open but > there is no > '>' at the end, and add "..." if necessary) > > By using the following: > $s=~s/<(?!.*?>)//; > I only get a removal of the non-matching '<': > "The <b>L</b>ibrary<b> of font color...";
In an 'unterminated' tag, you would find either the end of the string or a '<', right? How about: $s =~ s/<[^>]*(?:<|$)//; This replaces (begin regex) a '<' followed by zero or more non-'>' characters followed by either another '<' or the end of the string (represented by the '$') (end regex) with nothing. The '?:' prevents the parenthesis from storing the characters inside them into $1 so it can run more quickly. Hope that helps, -dave -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]