> I have strings like the following one:
> my $s="The <b>L</b>ibrary<b> of <font color...";
> 
> I want to truncate the string, to become
> "The <b>L</b>ibrary<b> of ..."
> (that is remove 'unterminated' html tags - tags that open but 
> there is no 
> '>' at the end, and add "..." if necessary)
> 
> By using the following:
> $s=~s/<(?!.*?>)//;
> I only get a removal of the non-matching '<':
> "The <b>L</b>ibrary<b> of font color...";

In an 'unterminated' tag, you would find either the end of the string or
a '<', right? How about:

$s =~ s/<[^>]*(?:<|$)//;

This replaces (begin regex) a '<' followed by zero or more non-'>'
characters followed by either another '<' or the end of the string
(represented by the '$') (end regex) with nothing. The '?:' prevents the
parenthesis from storing the characters inside them into $1 so it can
run more quickly.

Hope that helps,

 -dave



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to