Laura asked for my help on this issue. What I found is that setting the environment variable SP_CHARSET_FIXED to 1 makes the onsgmls program use the Unicode 2.0 character set, as the referenced web page says. However, it uses only the first 65536 characters (the iso10646-ucs-2 character set), so character number 128513 triggers the error since it is outside that range. In order to make that work, you need to ensure SP_CHARSET_FIXED is unset in the validate script. However, XML files need SP_CHARSET_FIXED set. So, I suggest something like this (patch attached):
if ($xhtml{$htmlLevel}) { $ENV{'SGML_CATALOG_FILES'} = $xhtmlCatalog; $ENV{'SP_CHARSET_FIXED'} = 1; $ENV{'SP_ENCODING'} = 'xml'; } else { $ENV{'SGML_CATALOG_FILES'} = $htmlCatalog; if (defined $charset) { $ENV{'SP_BCTF'} = $charset; } else { $ENV{'SP_BCTF'} = "utf-8"; } } That also changes the default character set for HTML from ISO-8859-1 to UTF-8 because the former is not a valid BCTF option. It appears the validate script only uses that default if there is not a character set defined in the HTML file itself and there is no character set option passed to the script. I didn't set up the whole web site build on my machine to test if this change has any negative effects on pages other than en_GB.it.html , so it needs broader testing.
diff --git a/scripts/validate b/scripts/validate index 7d20f1c..a41c1cb 100755 --- a/scripts/validate +++ b/scripts/validate @@ -364,16 +364,16 @@ foreach $file (@files) { # environment accordingly. if ($xhtml{$htmlLevel}) { $ENV{'SGML_CATALOG_FILES'} = $xhtmlCatalog; + $ENV{'SP_CHARSET_FIXED'} = 1; $ENV{'SP_ENCODING'} = 'xml'; } else { $ENV{'SGML_CATALOG_FILES'} = $htmlCatalog; if (defined $charset) { - $ENV{'SP_ENCODING'} = $charset; + $ENV{'SP_BCTF'} = $charset; } else { - $ENV{'SP_ENCODING'} = "ISO-8859-1"; + $ENV{'SP_BCTF'} = "utf-8"; } } - $ENV{'SP_CHARSET_FIXED'} = 1; if ($verbose) { if ($file eq '-') {