first, it is stupid to blame about names which are valid.
it is also stupid that taking care of each occurrences coming up.
as pages are all utf-8 now, no need to keep such references,
this patch restores original characters instead of numeric references

patch below:
Index: english/international/l10n/scripts/gen-files.pl
===================================================================
--- english/international/l10n/scripts/gen-files.pl     (revision 232)
+++ english/international/l10n/scripts/gen-files.pl     (working copy)
@@ -3,6 +3,7 @@
 use strict;
 use File::Path;
 use Getopt::Long;
+use Encode qw(encode);
 
 use lib ($0 =~ m|(.*)/|, $1 or ".") ."/../../../../Perl";
 
@@ -117,8 +118,7 @@
         $name =~ s/\s*<.*//;
         $name =~ s/&(?!#)/&amp;/g;
         $name =~ s/=\?.*?\?=//g;
-        # BREAK PERMITTED HERE (U+0082) is not allowed in HTML 4.01.
-        $name =~ s/(?:&#0*130;|&#x0*82;|\N{U+0082})//ig;
+        $name =~ s/&#(\d+);/encode("UTF-8",chr($1))/ge;
         $name = 'DDTP' if $name eq 'Debian Description Translation Project';
         $name = '' if $name =~ m/\@/;
         return $name;


-- 
victory
no need to CC me :-)

Reply via email to