Some XHTML pages declared as using UTF-8 are erroneously taken to be non-UTF-8.
The syntax for encoding declarations allows the use of either ' or " for the encoding name. Further, the spec says that the encoding name should be parsed case-insensitively. See <https://www.w3.org/TR/2008/REC-xml-20081126/#NT-EncodingDecl>. diff --git a/WWW/Library/Implementation/SGML.c b/WWW/Library/Implementation/SGML.c index 2534606e..b8cb2cbf 100644 --- a/WWW/Library/Implementation/SGML.c +++ b/WWW/Library/Implementation/SGML.c @@ -903,9 +903,9 @@ static void handle_processing_instruction(HTStream *me) if (t != 0) { t += 9; - if (*t == '"') + if (*t == '"' || *t == '\'') ++t; - flag = !StrNCmp(t, "utf-8", 5); + flag = !strncasecomp(t, "utf-8", 5); } if (flag) { CTRACE((tfp, "...Use UTF-8 for XML\n"));