If I had ever learned how "mixed mode" affects white space handling in XML (I probably did learn it 5-6 years ago), then at least I had forgotten it ..

Thanks for the advice to modify the DTD. I guess will do that - if you don't do it. Because:

XMLmind has made a few willful violations of the HTML5 spec, such as permitting the border attribute for the table element. Amd I think it would be very meaningful , perhaps more meaningful than allowing the border attribute for table, to limit the content model of <body>, <section> and perhaps some other HTML block elements as well, to a model where text nodes are unpermitted. And why not do the same for the DITA <section> element.

Remember, as well, that the current behavior causes aiuthors to - unwillingly - commit errors in the form of text nodes as direct children of <body> and <section>.

Leif Halvard Silli


On 31 Dec 2020, at 10:25, Hussein Shafie wrote:

On 12/31/20 1:44 AM, Leif Halvard Silli wrote:
An interesting problem ...

To solve the issue from our point of view, I have invested in a XML minifier. Why could not XXE do something similar?

Anyway, just some questions,  for  understanding and verification of your explanation:

The <ol> and <ul> elements might, per HTML5, contain whitespace and comments and even, I think, template elements and script elements. This in addition to the obligatory <li> elements. Note that I am now talking about the code level and not rendering level.

Such whitespace is not to be rendered, though. And, lo and behold, XXE never renders empty lines inside <ol> or <ul>. So all is good.

How is this different from the <body> element?

The content model of <ol> and <ul> found in our in-house HTML5 W3C XML Schema is:

---
((li | script | template))*
---

That is, TEXT not allowed. See attached screenshot.

OTOH, the content model of <body> found in our in-house HTML5 W3C XML Schema is:

---
Element body can contain TEXT.
((em | strong | small | s | cite |
  q | dfn | abbr | ruby | data |
  time | code | var | samp | kbd |
  sub | sup | i | b | u |
  mark | bdi | bdo | span | br |
  wbr | mml:math | svg:svg | picture | img |
  iframe | embed | area | label | input |
  button | select | datalist | textarea | output |
  progress | meter | link | script | template |
  [1] | address | p | hr | pre |
  blockquote | ol | ul | menu | dl |
  figure | div | a#2 | ins#2 | del#2 |
  object#2 | video#2 | audio#2 | map#2 | table |
  form | fieldset | details | dialog | noscript#2 |
  slot#2 | canvas#2 | article | section | nav |
  aside | h1 | h2 | h3 | h4 |
  h5 | h6 | hgroup | header | footer |
  main))*
---

You can check this by yourself simply by selecting an element and then choosing menu item "Help|Show Content Model" (http://www.xmlmind.com/xmleditor/_distrib/doc/help/helpMenu.html)




If this really is different, why not switch to the XHTML1.x behavior? XXE is an editor and not a xHTML renderer. A conscious break with xHTML5, if necessary. Text nodes directly as children of <body> is anyhow something to avoid. Especially in the kind of documents for which XXE is an excellent writing tool.

In fact, any time XXE permits me to write something like the following, it is - from my point of view - just an accident and a confusing pain in the ass - code example:

<body>
<p>Para 1.</p>
Para 2.
<p>Para 3.</p>
</body>


Note that you can also achieve a similar "mess" with the stock DITA DTD. For example, a <section> may contain TEXT in addition to "blocks".

If it's OK for the schema, then it's allowed by XXE. It's as simple as that.

In order to solve your issue, I would recommend using a customized HTML5 schema (a very simple *strict* subset) rather than our stock HTML5 schema.



--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to