RE: remove part of a string - unterminated tags

Gfoo Tue, 26 Mar 2002 01:30:18 -0800

[EMAIL PROTECTED] (David Gray) wrote in 
000001c1d435$f2306e60$[EMAIL PROTECTED]:">news:000001c1d435$f2306e60$[EMAIL PROTECTED]:


>> >> I have strings like the following one:
>> >> my $s="The <b>L</b>ibrary<b> of <font color...";
>> >> 
>> >> I want to truncate the string, to become
>> >> "The <b>L</b>ibrary<b> of ..."
>> >> (that is remove 'unterminated' html tags - tags that open but
>> >> there is no 
>> >> '>' at the end, and add "..." if necessary) ...
>> > 
>> > In an 'unterminated' tag, you would find either the end of  the
>> > string or a '<', right? How about:
>> > 
>> > $s =~ s/<[^>]*(?:<|$)//;
>> > 
>> > 
>> > Hope that helps,
>> > 
>> >  -dave
>> >  
>> 
>> Thank you for your reply,
>> 
>> what about:
>> 
>> $s=~s/<[^>]*(?!.*?>)//;
> 
> Ok, well if that works, great :) do you understand what you're doing
> there? (zero-width negative look-ahead assertion) You're looking for '<'
> followed by zero or more non-'>' characters, followed by a sequence that
> isn't '.*?>'.
> 
> You can remove the negative character class ( [^>]* ) if you want,
> that's a bit redundant if you're going to use the negative look-ahead.
> 
>  -dave
> 
> 
> 

I understand that by using:

$s=~s/<[^>]*(?!.*?>)// 

the negative character class ( [^>]* ) seems redundant... 

But, strange, If I remove it (then we go to the one I wrote on my first 
message):

$s=~s/<(?!.*?>)// 

it only removes the last orphan "<" character without its content.

It seems that (?!.*?>) works only as a conditional, but the pattern to be 
replaced is just the '<' character.
So it seems that ( [^>]* ) is needed.

Thank you


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: remove part of a string - unterminated tags

Reply via email to