Re: [XXE] RFE: Let automatic oneway conversion of named character entities also apply to documents without DTD

Hussein Shafie Mon, 01 Aug 2022 07:12:25 -0700

On 7/31/22 16:15, Leif H Silli wrote:

Title:


    RFE: Let automatic oneway conversion of named character entities
         *also* apply to documents without DTD

Alternative title of this letter:

    RFE: Please take advantage of XXE’s destructive treatment of
         named character entities.

Issue:

XXE fails to work seamlessly with documents without a DTD thatdefine named character entities, when and if the document contains namedcharacter references.


Example:

    1. When trying to open the following well-formed document,
       <html xmlns="http://www.w3.org/1999/xhtml”>&hellip;</html>
    2. XXE shows an error message stating that “hellip” is undeclared.


This error message is correct.

    3. If you click the «OK» button, the file opens in XML source view,
       but the HTML namespace is not “bestowed” unto the document, which
       means that XXE does not allow you to work with the document in
       semantic WYSIWYG mode.


That's right.


Consequences of current behavior:

    Your workflow is broken.

Wokrarounds:

    1. Alternative: You go through step 1 to 3 in the example, and apply
       some conversion tool ion the source code and reopen the file.

    2. Alternative: You fiddle with the other editor you use - if you
       have acccess to it and if that editor allows you to prefer
       directly typed text over named character entities.

    3. Alternative: You attach a DTD to the files you work with. The
       "HTML MathML entity set” would do the job:
       https://www.w3.org/TR/xml-entity-names/#htmlmathml

    4. XXE itself should have a (better) strategy for this!

       In particluar, for HTML documents, I do not get why XXE
       treats undeclared named character entities (so) different
       from declared named character references!

What should happen instead?

Proposal:

XXE should open such documents “normally” – in the specifiednamespace, probably/perhaps with a warning stating that named characterentities has been converted to directly typed characters.

I'm sorry but unlike Web browsers which specialize in HTML, XXE, bydesign, does not attempt to guess anything. For example, it will notattempt to "automagically" map "…" to Unicode character U+2026(Horizontal Ellipsis).

Justifications:
Even for documents *with* a DTD that define named characterentities, XXE simply – without a warning – converts those entities todirectly typed characters.[*] This "destructive" behavior with regard tonamed character entities, seems to break what the XXE documentationstates in its XML source view documentation, see<https://www.xmlmind.com/xmleditor/_distrib/doc/help/xml_source_menu_item.html>, quote:
]] The source view of a document shows the contents of the savefile which would be created by XXE for this document (same automaticindentation, same named character entities, etc). [[
Because, with regard to what happens to named character references,the above is not the case - such entities are simply silently convertedto their directly typed character equivalent.
(Caveat: You, as user, can make XXE preserve named characterentities, but then you must add “hellip” (and other named characterentities that you want to keep) to the list of exceptions in XXE’s Savepreferences - only then will … be saved and be displayed in XMLSource view.)
However, XXE’s ”destructive” behavior is fully in style/line withXXE’s “destructive" treatment of source code. For example, XXE does notcare about your nicely indented code, but instead inserts and removesnon-semantic whitespace wherever it wants.


We are well-aware of this limitation. Always have been.

For example, see Norman Walsh (famous XML expert and very muchappreciated by us) rants about XMLmind XML Editor:


February 23, 2006 (!!!) https://norman.walsh.name/2006/02/23/whitespace
---

Hey, you! Yeah, you! XML editing tool vendor! Lemme ask you something,why is it that you think you can fuck with the white space in my mixedcontent? White space in mixed content is significant. If I put it there,leave it alone! If I didn't put it there, keep your “helpful” fingersout of it!

...
I'm talking to you, XMLmind.
---

Finally, there is already an option in the Preferences to «Simulatea DTD» when there is no DTD, and it would certainly be in place, andmake sense, and be in line with the HTML5 spec, to (at least with awarning) simulate that named character entities has been declared.

"Simulate a DTD" does not mean guessing. It simply uses the elements andattributes already found in a schema-less document instance to make itquicker and easier adding more elements and more attributes having thesame names.

The finaly justification is of course in order to promoteinteroperability. There are just too many XML/XHTML/HTML editors andauthors out there that insert various characters as named characterentities. One of the most common issues, is of course the no-break spacecharacter, which so often occurs in source code as " ”.


That's right.

From time to time, there are complaints about XXE’s “destructive”treatment of source code. My claim is that, with regard to namedcharacter references, XXE would fit better in its common work flows ifit would go all out in its “destructive” behavior.

Once again, you are right.

However we currently don't plan to change XXE behavior in this regard inthe near future.

Note that, quite honestly, "XXE’s “destructive” treatment of sourcecode" is flagged in red as being a possible "deal breaker". Seehttp://www.xmlmind.com/xmleditor/features.html

Therefore if this limitation is really a problem for an XML author,she/he must not even attempt to use XXE.





--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Re: [XXE] RFE: Let automatic oneway conversion of named character entities *also* apply to documents without DTD

Reply via email to

Re: [XXE] RFE: Let automatic oneway conversion of named character entities also apply to documents without DTD