Frank Lichtenheld said: > Attached a patch that would encode HTML special chars in the short > description (see #181872 for a similar discussion on long > descriptions) both in all_packages and in the packages pages. > > The lines handling & are commented out. See the corresponding > discussion in the bug mentioned above. > > Greetings, > Frank
I would suggest (as I have irritatingly done so elsewhere) that rather than using fixed regexes, you use the SGML::ISO8859::str2sgml() or HTML::Entities::encode_entities() functions on $short_desc and $long_desc. Also IMO better to fix only '&' when it doesn't look like an entity; your proposed fix in 181872 will match all the &#...; entities but miss nearly all of the named ones! You can easily import the entity hash %char2entity from HTML::Entities and do the fixing yourself, or just do a convoluted bit of passing the string back and forth between the encode() and decode() functions to get it consistent. For example: [harp:~]$ perl -e 'use HTML::Entities; $foo = "&foo blah & <url>\n"; print $foo, encode_entities($foo), encode_entities(decode_entities($foo)), decode_entities(encode_entities(decode_entities($foo)));' ... will yield the following: &foo blah & <url> [original string, yuck] &foo blah &amp; <url> [no good!] &foo blah & <url> [ahah, better, original & is preserved] &foo blah & <url> [gives us this when decoded, good!] This way if a package description has some encoded entities in it already (eg & in a URL) as well as unencoded things (eg '<'), you would first run it through an SGML entity decoder, and then run the output through an SGML entity encoder. e.g. use HTML::Entities (); my $short_desc = $package{$_}{'short-desc'}; $short_desc = HTML::Entities::decode_entities($short_desc); $short_desc = HTML::Entities::encode_entities($short_desc); or: use HTML::Entities (); # up the top of the script somewhere ... $all_package .= "\n <dd>" . \ HTML::Entities::decode_entities( \ HTML::Entities::encode_entities( \ $package{$_}{'short-desc'} ) ) . "\n"; Then again for $long_desc ... Think on it anyway, it seems good to me but maybe you have some other thoughts. > Index: htmlscripts/pages.pl > =================================================================== > RCS file: /cvs/webwml/packages/htmlscripts/pages.pl,v > retrieving revision 1.10 > diff -u -IMD5 -r1.10 pages.pl > --- htmlscripts/pages.pl 24 Mar 2003 15:05:57 -0000 1.10 > +++ htmlscripts/pages.pl 29 Mar 2003 15:23:42 -0000 > @@ -113,7 +113,11 @@ > if ($distrib =~ /(contrib|non-free|non-us|security)/o) { > $all_package .= " [<font > color=\"red\">$distrib</font>]\n"; > } > - $all_package .= "\n <dd>".$package{$_}{'short-desc'}."\n"; > + my $short_desc = $package{$_}{'short-desc'}; > +# $short_desc =~ s/&/\&\;/go; > + $short_desc =~ s/</\<\;/go; > + $short_desc =~ s/>/\>\;/go; > + $all_package .= "\n <dd>".$short_desc."\n"; > } > $all_package .= "</dl>\n"; > $all_package .= trailer('../..'); > @@ -161,6 +165,9 @@ > } > $short_desc = $package{$pack}{'short-desc'}; > $long_desc = $package{$pack}{'long-desc'}; > +# $short_desc =~ s/\&/\&\;/go; > + $short_desc =~ s/</\<\;/go; > + $short_desc =~ s/>/\>\;/go; > $long_desc =~ s,<((URL:)?http://[\S~-]+?/?)>,\<\;$1\>\;,go; > $long_desc =~ > s,(http://[\S~-]+?/?)((\>\;)?[)]?[']?[.\,]?(\s|$)),<a > href=\"$1\">$1</a>$2,go; > $long_desc =~ s/\A //o; > Andrew. -- Andrew Shugg <[EMAIL PROTECTED]> http://www.neep.com.au/ "Just remember, Mr Fawlty, there's always someone worse off than yourself." "Is there? Well I'd like to meet him. I could do with a good laugh."