Frank Lichtenheld said:
> Attached a patch that would encode HTML special chars in the short
> description (see #181872 for a similar discussion on long
> descriptions) both in all_packages and in the packages pages.
> 
> The lines handling & are commented out. See the corresponding
> discussion in the bug mentioned above.
> 
> Greetings,
>       Frank

I would suggest (as I have irritatingly done so elsewhere) that rather
than using fixed regexes, you use the SGML::ISO8859::str2sgml() or
HTML::Entities::encode_entities() functions on $short_desc and $long_desc.

Also IMO better to fix only '&' when it doesn't look like an entity;
your proposed fix in 181872 will match all the &#...; entities but miss
nearly all of the named ones!  You can easily import the entity hash
%char2entity from HTML::Entities and do the fixing yourself, or just do
a convoluted bit of passing the string back and forth between the
encode() and decode() functions to get it consistent.

For example:

[harp:~]$ perl -e 'use HTML::Entities; $foo = "&foo blah &amp; <url>\n"; print 
$foo, encode_entities($foo), encode_entities(decode_entities($foo)), 
decode_entities(encode_entities(decode_entities($foo)));'

... will yield the following:

&foo blah &amp; <url>                   [original string, yuck]
&amp;foo blah &amp;amp; &lt;url&gt;     [no good!]
&amp;foo blah &amp; &lt;url&gt;         [ahah, better, original &amp is 
preserved]
&foo blah & <url>                       [gives us this when decoded, good!]


This way if a package description has some encoded entities in it
already (eg &amp; in a URL) as well as unencoded things (eg '<'), you
would first run it through an SGML entity decoder, and then run the
output through an SGML entity encoder.

e.g.

  use HTML::Entities ();
  my $short_desc = $package{$_}{'short-desc'};
  $short_desc = HTML::Entities::decode_entities($short_desc);
  $short_desc = HTML::Entities::encode_entities($short_desc);

or:

  use HTML::Entities ();        # up the top of the script somewhere
  ...
  $all_package .= "\n     <dd>" . \
        HTML::Entities::decode_entities( \
        HTML::Entities::encode_entities( \
        $package{$_}{'short-desc'} ) ) . "\n";

Then again for $long_desc ...

Think on it anyway, it seems good to me but maybe you have some other
thoughts.


> Index: htmlscripts/pages.pl
> ===================================================================
> RCS file: /cvs/webwml/packages/htmlscripts/pages.pl,v
> retrieving revision 1.10
> diff -u -IMD5 -r1.10 pages.pl
> --- htmlscripts/pages.pl      24 Mar 2003 15:05:57 -0000      1.10
> +++ htmlscripts/pages.pl      29 Mar 2003 15:23:42 -0000
> @@ -113,7 +113,11 @@
>               if ($distrib =~ /(contrib|non-free|non-us|security)/o) {
>                       $all_package .= " [<font 
> color=\"red\">$distrib</font>]\n";
>               }
> -             $all_package .= "\n     <dd>".$package{$_}{'short-desc'}."\n";
> +             my $short_desc = $package{$_}{'short-desc'};
> +#            $short_desc =~ s/&/\&amp\;/go;
> +             $short_desc =~ s/</\&lt\;/go;
> +             $short_desc =~ s/>/\&gt\;/go;
> +             $all_package .= "\n     <dd>".$short_desc."\n";
>       }
>       $all_package .= "</dl>\n";
>       $all_package .= trailer('../..');
> @@ -161,6 +165,9 @@
>               }
>               $short_desc = $package{$pack}{'short-desc'};
>               $long_desc = $package{$pack}{'long-desc'};
> +#            $short_desc =~ s/\&/\&amp\;/go;
> +             $short_desc =~ s/</\&lt\;/go;
> +             $short_desc =~ s/>/\&gt\;/go;
>               $long_desc =~ s,<((URL:)?http://[\S~-]+?/?)>,\&lt\;$1\&gt\;,go;
>               $long_desc =~ 
> s,(http://[\S~-]+?/?)((\&gt\;)?[)]?[']?[.\,]?(\s|$)),<a 
> href=\"$1\">$1</a>$2,go;
>               $long_desc =~ s/\A //o;
> 

Andrew.

-- 
Andrew Shugg <[EMAIL PROTECTED]>                   http://www.neep.com.au/

"Just remember, Mr Fawlty, there's always someone worse off than yourself."
"Is there?  Well I'd like to meet him.  I could do with a good laugh."

Reply via email to