[EMAIL PROTECTED] (David Gray) wrote in 000001c1d435$f2306e60$[EMAIL PROTECTED]:">news:000001c1d435$f2306e60$[EMAIL PROTECTED]:
>> >> I have strings like the following one: >> >> my $s="The <b>L</b>ibrary<b> of <font color..."; >> >> >> >> I want to truncate the string, to become >> >> "The <b>L</b>ibrary<b> of ..." >> >> (that is remove 'unterminated' html tags - tags that open but >> >> there is no >> >> '>' at the end, and add "..." if necessary) ... >> > >> > In an 'unterminated' tag, you would find either the end of the >> > string or a '<', right? How about: >> > >> > $s =~ s/<[^>]*(?:<|$)//; >> > >> > >> > Hope that helps, >> > >> > -dave >> > >> >> Thank you for your reply, >> >> what about: >> >> $s=~s/<[^>]*(?!.*?>)//; > > Ok, well if that works, great :) do you understand what you're doing > there? (zero-width negative look-ahead assertion) You're looking for '<' > followed by zero or more non-'>' characters, followed by a sequence that > isn't '.*?>'. > > You can remove the negative character class ( [^>]* ) if you want, > that's a bit redundant if you're going to use the negative look-ahead. > > -dave > > > I understand that by using: $s=~s/<[^>]*(?!.*?>)// the negative character class ( [^>]* ) seems redundant... But, strange, If I remove it (then we go to the one I wrote on my first message): $s=~s/<(?!.*?>)// it only removes the last orphan "<" character without its content. It seems that (?!.*?>) works only as a conditional, but the pattern to be replaced is just the '<' character. So it seems that ( [^>]* ) is needed. Thank you -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]