L.H. Silli wrote: > Xerces report: https://issues.apache.org/jira/browse/XERCESC-1967 >
Your bug report is titled "Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the charset parameter of the HTTP content-type: header" and is about Xerces-C++ 3.1.1. We use Xerces Java 2.9.1 in our product. This parser does not ignore the BOM in the test cases attached to this email. All the test cases have no XML declaration (i.e. no <?xml version="1.0" encoding="XXX"?>). utf16.xml starts with an UTF-16 BOM. utf8_BOM.xml starts with an UTF-8 BOM. utf8.xml does not start with an UTF-8 BOM and has no XML declaration, therefore it is the fallback encoding, UTF-8, which is used.
<!-- initially, the default namespace is "books" --> <book xml:lang="fr" xmlns="urn:loc.gov:books" xmlns:ns="urn:w3-org-ns:HTML" xmlns:isbn="urn:ISBN:0-395-36341-6" xmlns:foo="foo" xmlns:bk="urn:loc.gov:books"> <title>Cheaper by the Dozen - Oh le bel été que voilà !</title> <isbn:number>1568491379</isbn:number> <foo:number>1568491379</foo:number> <notes bk:type="comment"> <!-- make HTML the default namespace for some commentary --> <ns:p>This is a <ns:i>funny</ns:i> book! <footnote xmlns="">1.<text xmlns="urn:loc.gov:books">xxx</text></footnote> </ns:p> </notes> <isbn:type>bar</isbn:type> </book>
<!-- initially, the default namespace is "books" --> <book xml:lang="fr" xmlns="urn:loc.gov:books" xmlns:ns="urn:w3-org-ns:HTML" xmlns:isbn="urn:ISBN:0-395-36341-6" xmlns:foo="foo" xmlns:bk="urn:loc.gov:books"> <title>Cheaper by the Dozen - Oh le bel été que voilà !</title> <isbn:number>1568491379</isbn:number> <foo:number>1568491379</foo:number> <notes bk:type="comment"> <!-- make HTML the default namespace for some commentary --> <ns:p>This is a <ns:i>funny</ns:i> book! <footnote xmlns="">1.<text xmlns="urn:loc.gov:books">xxx</text></footnote> </ns:p> </notes> <isbn:type>bar</isbn:type> </book>
þÿ <