On 7/31/22 16:15, Leif H Silli wrote:
Title:
RFE: Let automatic oneway conversion of named character entities
*also* apply to documents without DTD
Alternative title of this letter:
RFE: Please take advantage of XXE’s destructive treatment of
named character entities.
Issue:
XXE fails to work seamlessly with documents without a DTD that
define named character entities, when and if the document contains named
character references.
Example:
1. When trying to open the following well-formed document,
<html xmlns="http://www.w3.org/1999/xhtml”>…</html>
2. XXE shows an error message stating that “hellip” is undeclared.
This error message is correct.
3. If you click the «OK» button, the file opens in XML source view,
but the HTML namespace is not “bestowed” unto the document, which
means that XXE does not allow you to work with the document in
semantic WYSIWYG mode.
That's right.
Consequences of current behavior:
Your workflow is broken.
Wokrarounds:
1. Alternative: You go through step 1 to 3 in the example, and apply
some conversion tool ion the source code and reopen the file.
2. Alternative: You fiddle with the other editor you use - if you
have acccess to it and if that editor allows you to prefer
directly typed text over named character entities.
3. Alternative: You attach a DTD to the files you work with. The
"HTML MathML entity set” would do the job:
https://www.w3.org/TR/xml-entity-names/#htmlmathml
4. XXE itself should have a (better) strategy for this!
In particluar, for HTML documents, I do not get why XXE
treats undeclared named character entities (so) different
from declared named character references!
What should happen instead?
Proposal:
XXE should open such documents “normally” – in the specified
namespace, probably/perhaps with a warning stating that named character
entities has been converted to directly typed characters.
I'm sorry but unlike Web browsers which specialize in HTML, XXE, by
design, does not attempt to guess anything. For example, it will not
attempt to "automagically" map "…" to Unicode character U+2026
(Horizontal Ellipsis).
Justifications:
Even for documents *with* a DTD that define named character
entities, XXE simply – without a warning – converts those entities to
directly typed characters.[*] This "destructive" behavior with regard to
named character entities, seems to break what the XXE documentation
states in its XML source view documentation, see
<https://www.xmlmind.com/xmleditor/_distrib/doc/help/xml_source_menu_item.html>, quote:
]] The source view of a document shows the contents of the save
file which would be created by XXE for this document (same automatic
indentation, same named character entities, etc). [[
Because, with regard to what happens to named character references,
the above is not the case - such entities are simply silently converted
to their directly typed character equivalent.
(Caveat: You, as user, can make XXE preserve named character
entities, but then you must add “hellip” (and other named character
entities that you want to keep) to the list of exceptions in XXE’s Save
preferences - only then will … be saved and be displayed in XML
Source view.)
However, XXE’s ”destructive” behavior is fully in style/line with
XXE’s “destructive" treatment of source code. For example, XXE does not
care about your nicely indented code, but instead inserts and removes
non-semantic whitespace wherever it wants.
We are well-aware of this limitation. Always have been.
For example, see Norman Walsh (famous XML expert and very much
appreciated by us) rants about XMLmind XML Editor:
February 23, 2006 (!!!) https://norman.walsh.name/2006/02/23/whitespace
---
Hey, you! Yeah, you! XML editing tool vendor! Lemme ask you something,
why is it that you think you can fuck with the white space in my mixed
content? White space in mixed content is significant. If I put it there,
leave it alone! If I didn't put it there, keep your “helpful” fingers
out of it!
...
I'm talking to you, XMLmind.
---
Finally, there is already an option in the Preferences to «Simulate
a DTD» when there is no DTD, and it would certainly be in place, and
make sense, and be in line with the HTML5 spec, to (at least with a
warning) simulate that named character entities has been declared.
"Simulate a DTD" does not mean guessing. It simply uses the elements and
attributes already found in a schema-less document instance to make it
quicker and easier adding more elements and more attributes having the
same names.
The finaly justification is of course in order to promote
interoperability. There are just too many XML/XHTML/HTML editors and
authors out there that insert various characters as named character
entities. One of the most common issues, is of course the no-break space
character, which so often occurs in source code as " ”.
That's right.
From time to time, there are complaints about XXE’s “destructive”
treatment of source code. My claim is that, with regard to named
character references, XXE would fit better in its common work flows if
it would go all out in its “destructive” behavior.
Once again, you are right.
However we currently don't plan to change XXE behavior in this regard in
the near future.
Note that, quite honestly, "XXE’s “destructive” treatment of source
code" is flagged in red as being a possible "deal breaker". See
http://www.xmlmind.com/xmleditor/features.html
Therefore if this limitation is really a problem for an XML author,
she/he must not even attempt to use XXE.
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support