On 2022-08-01 16:12 Hussein Shafie wrote:
On 7/31/22 16:15, Leif H Silli wrote:
Title:
RFE: Let automatic oneway conversion of named character entities
*also* apply to documents without DTD
… snipped …
We are well-aware of this limitation. Always have been.
For example, see Norman Walsh (famous XML expert and very much
appreciated by us) rants about XMLmind XML Editor:
February 23, 2006 (!!!) https://norman.walsh.name/2006/02/23/whitespace
---
Hey, you! Yeah, you! XML editing tool vendor! Lemme ask you something,
why is it that you think you can fuck with the white space in my mixed
content? White space in mixed content is significant. If I put it
there, leave it alone! If I didn't put it there, keep your “helpful”
fingers out of it!
...
I'm talking to you, XMLmind.
---
It perhaps does not mattery very much - your attitude to my proposal
might be the same, either way … However, I would like to point out that
Walsh’s rant does not really hit the same nail that I try to hit …
Because: Please note that I am not begging for any new behavior with
regard to whitepace. Instead, I suggest that you follow the same pattern
for named entities as you already follow for whitespace: Destroy them,
but destroy them in an XML-compatible manner.
For any undeclared entity found in an XHTML document (or in a SVG or
MathML document for that matter - we do not need to be XHTML-specific -
the entities that HTML5 declares, are collected from HTML5, SVG and
MathML, in order to support interoperability between HTML, SVG and
MathML and so on), let XXE check if the names of the found entities
occur on the list that is declared by HTML5. And if they occur on that
list, assume that the entity (or entities) are meant to refer to the
characters declared by HTML5. And replace them with either
XML-compatible character references and/or with directly typed
characters.
Possibly, as well, issue a warning before the user is permitted to save
or edit the freshly opened document any further. (Such a warning would
be more than you currently do for re-arranged whitespace. However, since
entities are supposed to have a meaning defined outside the document,
such a warning would make sense.)
So, just to, once again, emphasize that I suggest to copy XXE’s current
behavior with regard to whitespace, and apply it to HTML5’s named
character entities as well, another way to say it, is this:
Whitespace (according to the XML spec) «consists of one or more space
(#x20) characters, carriage returns, line feeds, or tabs». These four
characters are thus treated as synonyms. So when XXE “destroys” the
whitespace that some other authoring tool or human author added to a
document, it simply means that it chooses the ”synonym” character that
fits the best with XXE’s rearrangment plan (which is affected, as well,
by the user configuration of XXE’s whitespace treatment). Typically, it
replaces tabs with spaces, as well as inserting hard return/line-break
wherever necessary.
Likewise, when it comes to the named character entities defined by HTML5
(but which applies to SVG and MathML just as much as they apply to
HTML), each named character reference is just a synonym for the directly
typed character as well as for its decimal or hexadecimal numeric
character reference. From a (human) author point of view (and perhaps
even from a computer program’s point of view), the use of instead
of the directly typed character (or numeric character reference) can be
highly significant. For instance, for a human, it is easier to spot the
entity than it (usually) is to spot the no-break-character
directly. Hence, there will be some users that would drop into rant
mode, when they discover an XML editor that converts the HTML5-defined
named character entities to their Unicode defined counterparts. It is,
in fact, some kind of destruction of the source code.
And to add a third similarity: For some authors, whitespace is
important. For many others, it isn’t. That is why XXE can get away with
it is current behavior. The same would be the case for the treatment of
HTML5 named entities that I suggest.
Btw, I still believe that XXE should respect entity declarations when
they exist, so that one could override the HTML5 named entity
declarations. Yeah, probably the behavior I suggest, should be applied
only to documents which lack named character entity declarations.
Finally, there is already an option in the Preferences to
«Simulate a DTD» when there is no DTD, and it would certainly be in
place, and make sense, and be in line with the HTML5 spec, to (at
least with a warning) simulate that named character entities has been
declared.
"Simulate a DTD" does not mean guessing. It simply uses the elements
and attributes already found in a schema-less document instance to
make it quicker and easier adding more elements and more attributes
having the same names.
Thanks for explaining. Makes sense, now ...
From time to time, there are complaints about XXE’s “destructive”
treatment of source code. My claim is that, with regard to named
character references, XXE would fit better in its common work flows if
it would go all out in its “destructive” behavior.
Once again, you are right.
However we currently don't plan to change XXE behavior in this regard
in the near future.
If so, I guess I need to continue to add DTDs that declare those
entities then ...
Note that, quite honestly, "XXE’s “destructive” treatment of source
code" is flagged in red as being a possible "deal breaker". See
http://www.xmlmind.com/xmleditor/features.html
Therefore if this limitation is really a problem for an XML author,
she/he must not even attempt to use XXE.
Once again, I will reiterate that I do not suggest to stop the
destruction. I instead ask for more destruction … ;-)
Leif Halvard Silli
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support