Title:
RFE: Let automatic oneway conversion of named character entities
*also* apply to documents without DTD
Alternative title of this letter:
RFE: Please take advantage of XXE’s destructive treatment of
named character entities.
Issue:
XXE fails to work seamlessly with documents without a DTD that define
named character entities, when and if the document contains named
character references.
Example:
1. When trying to open the following well-formed document,
<html xmlns="http://www.w3.org/1999/xhtml”>…</html>
2. XXE shows an error message stating that “hellip” is undeclared.
3. If you click the «OK» button, the file opens in XML source view,
but the HTML namespace is not “bestowed” unto the document, which
means that XXE does not allow you to work with the document in
semantic WYSIWYG mode.
Consequences of current behavior:
Your workflow is broken.
Wokrarounds:
1. Alternative: You go through step 1 to 3 in the example, and apply
some conversion tool ion the source code and reopen the file.
2. Alternative: You fiddle with the other editor you use - if you
have acccess to it and if that editor allows you to prefer
directly typed text over named character entities.
3. Alternative: You attach a DTD to the files you work with. The
"HTML MathML entity set” would do the job:
https://www.w3.org/TR/xml-entity-names/#htmlmathml
4. XXE itself should have a (better) strategy for this!
In particluar, for HTML documents, I do not get why XXE
treats undeclared named character entities (so) different
from declared named character references!
What should happen instead?
Proposal:
XXE should open such documents “normally” – in the specified
namespace, probably/perhaps with a warning stating that named character
entities has been converted to directly typed characters.
Justifications:
Even for documents *with* a DTD that define named character
entities, XXE simply – without a warning – converts those entities to
directly typed characters.[*] This "destructive" behavior with regard to
named character entities, seems to break what the XXE documentation
states in its XML source view documentation, see
<https://www.xmlmind.com/xmleditor/_distrib/doc/help/xml_source_menu_item.html>,
quote:
]] The source view of a document shows the contents of the save file
which would be created by XXE for this document (same automatic
indentation, same named character entities, etc). [[
Because, with regard to what happens to named character references,
the above is not the case - such entities are simply silently converted
to their directly typed character equivalent.
(Caveat: You, as user, can make XXE preserve named character
entities, but then you must add “hellip” (and other named character
entities that you want to keep) to the list of exceptions in XXE’s Save
preferences - only then will … be saved and be displayed in XML
Source view.)
However, XXE’s ”destructive” behavior is fully in style/line with
XXE’s “destructive" treatment of source code. For example, XXE does not
care about your nicely indented code, but instead inserts and removes
non-semantic whitespace wherever it wants.
Finally, there is already an option in the Preferences to «Simulate
a DTD» when there is no DTD, and it would certainly be in place, and
make sense, and be in line with the HTML5 spec, to (at least with a
warning) simulate that named character entities has been declared.
The finaly justification is of course in order to promote
interoperability. There are just too many XML/XHTML/HTML editors and
authors out there that insert various characters as named character
entities. One of the most common issues, is of course the no-break space
character, which so often occurs in source code as " ”.
From time to time, there are complaints about XXE’s “destructive”
treatment of source code. My claim is that, with regard to named
character references, XXE would fit better in its common work flows if
it would go all out in its “destructive” behavior.
Leif Halvard Silli
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support