There is a helpful wiki page now that you found XML; it even mentions my specific character.[1] The main source seems to be at the w3.org spec.[2]
Rasmus <ras...@gmx.us> writes: > torys.ander...@gmail.com (Tory S. Anderson) writes: > >> From a user perspective just stripping the characters seems best to >> me, but finding out what the characters seems obnoxious. > > But maybe there is a valid way to represent such characters in XML? At > the very least entities must be replaced before stripping these... > >> Neither a quick search nor skimming the ODT doc specification[1][2] seem >> to give any insight into a set of illegal characters. Does elisp have >> anything similar to Java's "isWhitespace"[3] that could be used to check >> character features? > > It's an XML thing. When I tried to open the contents.xml with Firefox it > also says broken XML. But I also don't know which are the characters that > are not supported by XML. > > —Rasmus Footnotes: [1] https://en.wikipedia.org/wiki/Valid_characters_in_XML#XML_1.1 [2] http://www.w3.org/TR/xml11/#charsets