We'll try to implement in the next version of XXE what you call "the
third alternative" in the email below and explain in greater details in
your following email.
We'll do this even if the XInclude standard completely ignores the lang
attribute. It's either doing this or tolerating having an inconsistent
XInclude implementation when used in the context of (X)HTML documents.
On 5/4/23 00:39, Leif H Silli wrote:
XMLmind XML Editor (XXE) is an - eh - XML editor. But as XHTML is
typically consumed as text/HTML, in practise it is also a HTML editor.
XHTML has always had a 'complicated' relationship to xml:lang. It seems
like everyone really wants to just use lang - or at least use just one
attribute (which would have had to be lang). But - either - in order to
"look good" - or - in order to use XML tooling (which is rumoured to not
support lang), it is customary to put upon oneselves the burdon of
adding both lang and xml:lang. This was recommended back in 1998 when
XHTML 1 was released. And it was also recommended by the polyglot HTML
draft (even if, in my heart, I did not want to require both attributes).
Context: As for tooling, then we see this in XXE itself; When working
with XInclude, it turns out that @lang is ignored - only xml:lang is
respected and counted. (I might write a separate RFE or BUG about that.)
The consequence being that if, via XInclude, you embed
<section lang="en" id="sect01" />
then the way XXE implements what XInclude calls 'language fixup' causes
the above element to be embedded as if the language property is unknown,
which (as of XXE 10.4) means that XXE adds xml:lang="" to the embedded
element:
<section xml:lang="" id="sect01" />
And, if it does not delete the lang attribute, we even get this:
<section xml:lang="" lang="en" id="sect01" />
Which means that XInclude has created a document which is invalid,
since, when both xml:lang and lang are used, they must be in agreement.
Clearly, this behavior is wrong - on many levels. (A separate bug about
this will probably be written.)
However, this message is not about how XInclude is implemented, but
about making it more convenient to apply xml:lang in XHTML and HTML
documents. And the (current) need to use xml:lang warrants that it
should be simpler.
I see two ways to make it simpler: EITHER add some form of automation:
When someone adds or edits the lang attribute, then xml:langs is added
and/or edited, automatically, in parallell. OR offer xml:lang in the
default list of attributes to select from. (Clearly I prefer the
automated variant.) AND a third option: Decrease the need to use xml:lang.
So as of today, when authoring HTML or XHTML docuemnts, the lang
attribute is by default visible inside the Attribute editor. Just click
on the attribute name, and add the value. Whereas for xml:lang, you must
either manually type the name of the attribute before you can select it,
or you can change the defaults (on the fly) so that so called xml
attributes are also visible. (But this also makes xml:base and xml:space
visible. )
But the third alternative is what I prefer the most: Decrease the need
to use xml:lang. For instance, by changing the implementation of
XInclude so that lang is treated like xml:lang (and/or so that xml:lang
is kept in sync with lang).
On 5/4/23 01:36, Leif H Silli wrote:
The relase notes for XXE 10.4 refers to language fixup in XInclude 1.1. I must
first start be exolaining why we should not give too much heed to what XInclude
1.0 or 1.1 says about language properties.
XInclude need an update. XInclude 1.1 is a Working Groupn Note from 2016 [1],
while the final version of a Recommended spec, XInclude 1.0, is from 2006 [2].
The Note from 2016 includes some innovations such as set-xml-id (though it
should probably also have had a set-id attribute as well). But at the same
time, when it comes to language properties, the spec that it references, IETF
RFC 3066, published in 2001, was outdated when the Note was published: The
current best practise for language tagging, was specified in 2009 - seven years
before the Note was finished [3].
The work on HTML5 begun around 2006, when the first XInclude was published. In
HTML4, the lang attribute behaved different from the xml:lang attribute. But in
HTML5, which implements BCP 47, the specification of lang has been 'updated' so
that lang and xml:lang work the same (the only difference being that xml:lang
only works when consumed as XML).
So the Working Group Note from 2016 does not pick up all the changes that
happened to HTML and language tagging since 2006. Perhaps that is the reason
why XInclude only talks about xml:lang and not about lang? XInclude seems to
have been created in the spirit of XHTML 1.0, when the attitude was that we
will soon kill text/html. And so, for example, XInclude 1.1’s section on
Language Fixup from 2016, is identical with XInclude 1.0’s section on Language
Fixup from 2006.
Instead, we have ended up with situation where we try to keep HTML as XML and
HTML as text/html as much as possible in sync. In sync, but different.
It does therefore not make sense anymore that XInclude only considers xml:lang
and ignore lang.
XMLmind XML Editor version 10.,4 is an exmple of this. Per the relase notes
[4], XXE 10.4 “Made the language fixup of the XInclude 1.1 implementation more
conforming to the specification.“. (As I mentioned above, the language fixup of
XInclude 1.1 [5] is identical with the language fixup of XInclude 1.0 [6], so -
sorry to say it, but - the reference to XInclude 1.1 here, simply gives
appearans of being an up to date reference.)
So what is the change in 'language fixup' that has been added in XXE 10.4? Here
is an example:
When working with XInclude, it turns out that XXE ignores @lang - only xml:lang
is respected and counted. The consequence being that if, via XInclude, you
embed into another element the following element,
<section lang="en" id="sect01" />
then the way XXE implements 'language fixup' causes the above element to be embedded as
if the language property is unknown, which (as of XXE 10.4) means that XXE adds
xml:lang="" to the embedded element:
<section xml:lang="" id="sect01" />
And, if it does not delete the lang attribute, we even get this:
<section xml:lang="" lang="en" id="sect01" />
(Sometimes the @lang is deleted, other times it is not, I am not yet certain
about when what happens - but both things are meaningless.)
Which means that XXE 10.4’s implementation of XInclude has created a document which is
invalid, since, when both xml:lang and lang are used, they must be in agreement. Also, it
has failed to take the lang="en" attribute into account, thus loosing
information. Further more, if the end result - the resulting document of the xinclusion -
is meant for consumption by text/HTML consumers, then text/HTML consumers do not
understand the xml:lang attribute etc.
Solution: The solution is to treat @lang and xml:lang equally. Thus in the
example above, the result would have become this:
<section lang="en" id="sect01" />
Or (if you want to consider that not all Xinclude processors - if anyone at all
- handle the lang attribnuite) this:
<section lang="en" xml:lang="en" id="sect01" />
[1] https://www.w3.org/TR/xinclude-11/
[2] https://www.w3.org/TR/xinclude/
[3] https://www.ietf.org/rfc/bcp/bcp47.txt
[4] https://xmlmind.com/xmleditor/changes.html#v10.4.0
[5] https://www.w3.org/TR/xinclude-11/#language
[6] https://www.w3.org/TR/xinclude/#language
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support