On 7/31/22 16:15, Leif H Silli wrote:
Title:

    RFE: Let automatic oneway conversion of named character entities
         *also* apply to documents without DTD

Alternative title of this letter:

    RFE: Please take advantage of XXE’s destructive treatment of
         named character entities.

Issue:

   XXE fails to work seamlessly with documents without a DTD that define named character entities, when and if the document contains named character references.

Example:

    1. When trying to open the following well-formed document,
       <html xmlns="http://www.w3.org/1999/xhtml”>&hellip;</html>
    2. XXE shows an error message stating that “hellip” is undeclared.

This error message is correct.



    3. If you click the «OK» button, the file opens in XML source view,
       but the HTML namespace is not “bestowed” unto the document, which
       means that XXE does not allow you to work with the document in
       semantic WYSIWYG mode.

That's right.



Consequences of current behavior:

    Your workflow is broken.

Wokrarounds:

    1. Alternative: You go through step 1 to 3 in the example, and apply
       some conversion tool ion the source code and reopen the file.

    2. Alternative: You fiddle with the other editor you use - if you
       have acccess to it and if that editor allows you to prefer
       directly typed text over named character entities.

    3. Alternative: You attach a DTD to the files you work with. The
       "HTML MathML entity set” would do the job:
       https://www.w3.org/TR/xml-entity-names/#htmlmathml

    4. XXE itself should have a (better) strategy for this!

       In particluar, for HTML documents, I do not get why XXE
       treats undeclared named character entities (so) different
       from declared named character references!

What should happen instead?

Proposal:

    XXE should open such documents “normally” – in the specified namespace, probably/perhaps with a warning stating that named character entities has been converted to directly typed characters.

I'm sorry but unlike Web browsers which specialize in HTML, XXE, by design, does not attempt to guess anything. For example, it will not attempt to "automagically" map "&hellip;" to Unicode character U+2026 (Horizontal Ellipsis).



Justifications:

    Even for documents *with* a DTD that define named character entities, XXE simply – without a warning – converts those entities to directly typed characters.[*] This "destructive" behavior with regard to named character entities, seems to break what the XXE documentation states in its XML source view documentation, see <https://www.xmlmind.com/xmleditor/_distrib/doc/help/xml_source_menu_item.html>, quote:

    ]] The source view of a document shows the contents of the save file which would be created by XXE for this document (same automatic indentation, same named character entities, etc). [[

    Because, with regard to what happens to named character references, the above is not the case - such entities are simply silently converted to their directly typed character equivalent.

   (Caveat: You, as user, can make XXE preserve named character entities, but then you must add “hellip” (and other named character entities that you want to keep) to the list of exceptions in XXE’s Save preferences - only then will &hellip; be saved and be displayed in XML Source view.)

    However, XXE’s ”destructive” behavior is fully in style/line with XXE’s “destructive" treatment of source code. For example, XXE does not care about your nicely indented code, but instead inserts and removes non-semantic whitespace wherever it wants.

We are well-aware of this limitation. Always have been.

For example, see Norman Walsh (famous XML expert and very much appreciated by us) rants about XMLmind XML Editor:

February 23, 2006 (!!!) https://norman.walsh.name/2006/02/23/whitespace
---
Hey, you! Yeah, you! XML editing tool vendor! Lemme ask you something, why is it that you think you can fuck with the white space in my mixed content? White space in mixed content is significant. If I put it there, leave it alone! If I didn't put it there, keep your “helpful” fingers out of it!
...
I'm talking to you, XMLmind.
---





    Finally, there is already an option in the Preferences to «Simulate a DTD» when there is no DTD, and it would certainly be in place, and make sense, and be in line with the HTML5 spec, to (at least with a warning) simulate that named character entities has been declared.

"Simulate a DTD" does not mean guessing. It simply uses the elements and attributes already found in a schema-less document instance to make it quicker and easier adding more elements and more attributes having the same names.




    The finaly justification is of course in order to promote interoperability. There are just too many XML/XHTML/HTML editors and authors out there that insert various characters as named character entities. One of the most common issues, is of course the no-break space character, which so often occurs in source code as "&nbsp;”.

That's right.




   From time to time, there are complaints about XXE’s “destructive” treatment of source code. My claim is that, with regard to named character references, XXE would fit better in its common work flows if it would go all out in its “destructive” behavior.

Once again, you are right.

However we currently don't plan to change XXE behavior in this regard in the near future.

Note that, quite honestly, "XXE’s “destructive” treatment of source code" is flagged in red as being a possible "deal breaker". See http://www.xmlmind.com/xmleditor/features.html

Therefore if this limitation is really a problem for an XML author, she/he must not even attempt to use XXE.




--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to