On 28 May 2017, at 9:32, Hussein Shafie wrote:
On 05/28/2017 01:43 AM, Leif Halvard Silli wrote:
By specifying <assembly xml:lang="fr">, I expeceted to get the Table of
Contents heading of the realized document to be generated in French
(when converting to HTML).[1]

But that did not happen: It was generated in English. (For the record: Doing <assembly xml:lang="fr"> */does/* affect the language specified by
xml:lang in the the output HTML document.)

I cannot reproduce this:

---
<assembly xml:lang="fr"> */does/* affect the language specified by
xml:lang in the the output HTML document
---

I mean, in my tests, <assembly xml:lang="fr"> has no effect whatsoever on the generated document.

Ahem. You are right. My bad. The language was, it seems, picked up from the topic files. When I removed the langauge from the topic files, I had to specify the language on the <structure> in order to get it set in the output file.

(When I talk about effect of placing xml:lang on <structure>, I refer to [for XHTML5-outoput] the langauge being set on the topmost <section> element because the XSL stylesheets default to placing xml:lang there. Good practize is to place it on the root elmeent. But that is a separate issue - which I raized, to in the DocBook XSL mailing list some weeks ago.)

Only when I specified <structure xml:lang="fr">, did the Table of
Contents heading display in French.

Yes, that's right.

XML elements inherit the language. Thus this might be an assembly
processor bug.

It's not that clear.

Disagree, with regard to the specific issue at hand. See below.

The <assembly> is a document in itself: a specification (like a makefile or an Ant build.xml file) having its own title, author and language.

The title, author and language of a specification may be completely unrelated to the title, author and language of the documents generated using this specification.

I my view, this a somewhat theoretical claim. But even if it is theoretical, it is already covered by the XML 1.0 specification. Though, of course there, can be some legitimate minor issues to clarify, in the assembly specification.

When I say that it is theoretical, it is based on the fact that, for the most cases, it makes no sense that the realized document is of another language thn the <assembly> element - see explantions below. And I am not at all certain that it is helpful to compare with makefile and Ant build.xml. It seems far more relevant to consider whether it makes sense to specify one language for <html> and another for <body> - see below..

That's why I'm not sure that you have reported a bug.

I considered what you say.

But first: unless the child element <structure> has its own xml:lang="foo" specification, it does (so says the XML spec) inherit the language of the parent <assembly> element. Thus setting the language of <assembly> is equal to also setting it on the child elements, such as <structure> - [quoting XML 1.0](https://www.w3.org/TR/xml/#sec-lang-tag):

* «The language specified by xml:lang applies to the element where it is specified (including the values of its attributes), and to all elements in its content unless overridden with another instance of xml:lang.»

Thus, to the extent that it is the language of <structure> that governs the language of the (topmost element of the) realized document, then, in case of <assembly xml:lang="fr">, it is, per the XML spec, unneccessary to to specify xml:lang="fr" on <structure>.

And thus - and again: to the extent that it is the language of <structure> that governs the language of the realized document (and currently that is the way you implement it), it is a bug that the assembly processor does not recognize that the language of <structure> has already be set by <assembly xml:lang="fr">. I don’t spot any loophole for any other interpretation.

Switching from what applications MUST do over to what authors MAY do (which seems relevant to discuss, in view of what you said above):

The basic rule for language tagging is that you tag the root element with the main or dominating content language of the document (there are also tags for specing the language as 'multilingual', 'unknown' etc), and thus, in any child element that deviates from what is specified in root element, you tag it as an exception to what is specified in the root element. In fact, this rule 'jumps out' (at least to me) as the simple and logical good practize, given how xml:lang (and @lang in HTML5) is defined.

That said: the spec permits us to switch language, at whim. Thus, if if we look at HTML: it is perfectly legal to do <html lang="en"> and then to do <body lang="fr">. However: while it certainly exists exceptions, it almost never makes any sense to specify one language in the root and then another language in the main child element.

And I see no difference with regard to <assembly> versus <structure>. It fact, it could be said to make less sense in a DocBook assembly document than in a HTML document. Why? Because, for HTML, then specifying the language on <body> affects most of the public facing content anyway. And while it is true that <structure> takes care of most of the public facing content as well, the realized document picks up content not only from <structure> but also from the <relationships> element. And so, if you only specify the language on <structure> (and also: if the assembly processor fails to notice that the language was already declared in <assembly>), you must - as well - specify the language of the <releationships> elements.

Example from the assembly document for «DocBook assemblies and topics for the impatient» - if you only specify the language a <structure> you MUST as well specify the language on <relationships> (if you care about spellchecking, inside the assembly document, for instance):

    <relationships xml:lang="nn">
      <relationship>
        <association>Sjå også</association>
        <instance linkend="omittitles"/>
        <instance linkend="contentonly"/>
      </relationship>
      <relationship>
        <association>Sjå også</association>
        <instance linkend="filtering"/>
        <instance linkend="output"/>
      </relationship>
    </relationships>

The simple solution (and I hope that not my claim, but the very facts I have - hopefully - pointed to, are convincing) is thus that one should declare the language on the <assembly> element. That is the sensible thing to do, for the common cases: Monolingual realized documents without any sudden language switches. And there really isn’t anything special with DocBook assemblies that should make us thing otherwise - in this detail.

Also please note that the documentation of DocBook v5.1 assemblies is currently pretty sketchy:

http://tdg.docbook.org/tdg/5.1/ch06.html

We expect to fix issues like the one you have reported once DocBook v5.1 assemblies becomes better documented. (XMLmind has already filed several bug reports signaling missing information in the documentation of DocBook v5.1 assemblies.)

I can see that it is might be helpful for authors and developers that the DocBook assembly spec says something about how language inheritance is carried to work and how the relationship between the language of <assembly> (and its elements), and the language of the realized document and the language of the output from XSL conversion, is meant to work - there are some nuances worth pointing out there, I suppose.

(For instance, what if you specify contentonly='true' on a module element: This strips the root - or container - element from the 'pulled' topic. What, then, if that container also had xml:lang="foo" set? In what does the stripping of the wrapper element eventually also strip the language information from the pulled content? [My guess is that this would indeed strip the language completely from the pulled content - unless the language was, as well, double specified by placing xml:lang="foo" on the pulled child elements of the container element - but I am not completely certain])

But most - or at least a fair amoint - of what it can be expected to say is probably easily deducable from the XML spec and from the best practizes that have long since been defined w.r.t. language tagging.

[1] FYI, I worked with the source code of “DocBook Assemblies and Topics
for the Impatient
<http://www.xmlmind.com/tutorials/DocBookAssemblies/index.html>”.
--
leif halvard silli
--
XMLmind XML Editor Support List
xmleditor-support@xmlmind.com
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to