On 03/01/2012 03:23 PM, DM Smith wrote:
In most cases use of the canonical attribute is straightforward, and
the default values will almost always produce the intended result.
However, there will arise truly difficult cases: for example, one may
be encoding an ancient text with annotations of its own. In that case
those notes would be canonical, while any added by the current editor
would not be. In such cases, the practice chosen and its rationale
should be described in the work's documentation.
So, I take this that if I were creating an accurate representation of
the 1611 KJV from scans, everything in that "ancient" text would be
canonical, including introductions, notes, titles, cross-references, and
so forth.
I don't.
If you were producing a critical edition of the KJV title, say:
KJV Through the Years
then yes, you would be correct, all KJV material with notes would be
canonical (from a purist point of view), and your modern notes about the
'canonical notes' would not be canonical :)
If you were desiring to digitize the 1611 KJV, just because the work is
old, doesn't mean everything in the work is 'canonical'. What defines
'old'? Even a purist must decide on the base work. If the base work is
the KJV and we're adding modern notes, then you'd be correct. If the
base work is the New Testament, and we're marking up the KJV notes,
merely encoding an old work, then I would disagree.
:)
If it is not that way and it is to reflect the underlying publication
then I think there is a problem with the usage of the <transChange
type="added"> element . In this case these should be marked
canonical="false" as they are not part of the "base" text.
Different concepts.
A transChange relates to translation methodology against an original
text (not what we're calling a 'base text' above).
I took out the example about notes in a Bible translation. Its intent is
that canonical is to distinguish what was in the text the translation
was based from what was not in that base.
The confusion is that it is not at all clear what current editor means.
There are many who take the KJV, notes and all, make changes to it, say
modernizing the spelling, translate it into another language, .... So,
since their base is not the Hebrew and Greek, but a particular KJV text,
then according to this definition, the imported notes are now canonical.
But not for us. Our base text is always the study of the Bible, not the
study of a study Bible. does that make sense?
We would never give our users results from ancient notes when they asked
for results only from canonical text.
Now, certainly-- especially where I work-- I can conceive of users who
might mean 'include ancient notes' when they say they only want
canonical material. But these are not the 99.999% of our users.
But as a module encoder, I'd do it the way the OSIS defaults are
:) good. You are a purist, but you are also practical DM! That's one of
the many things I like about you.
, with one exception:
uh oh...
The <div> element.
OK, I think what you say below, in summary, is:
trojan milestones don't allow schema validators to preserve xml inheritance.
yes. They don't preserve xml hierarchy or enforce logic children
restricted sets or most anything else schema defines.
But that doesn't mean that the specification is wrong because the schema
can't be represented purely in schema.
The OSIS documentation speaks about the use of trojan milestones and the
deficiencies that go along with them, but also the overlapping hierarchy
problem they attempt to solve.
Wanna thumb wrestle for it?
-Troy
The canonical attribute is available on all elements.
The following elements without canonical:
osis
osisCorpus
teiHeader
work
workPrefix
It has a ‘default’ value so it does not have to be entered by the
encoder if the default value is acceptable.
A bit misleading. Only a few (8) element actually have a default. Note,
chapter is not there. And having it on osisText is silly (see below).
Default: true <xs:attribute name="canonical" type="xs:boolean"
use="optional" default="true"/>
osisText
verse
Default: false <xs:attribute name="canonical" type="xs:boolean"
use="optional" default="false"/>
header
div
note
reference
title
titlePage
The value of this attribute is "inherited," that is once it is set,
any subelement of that element inherits the same setting.
Default: inherited <xs:attribute name="canonical" type="xs:boolean"
use="optional"/>
The rest of the elements.
The examples on the same page are confusing, as they don't fit with the
XML inheritance mechanism. They have an explicit value on a parent
element forcing the inclusion of the attribute on an element with that
as a default. Having a default value means that that element never
inherits the value.
With inheritance, it should be possible at any point in the document,
using an XML parser to ask what the value of canonical is.
However, the attribute "canonical" is not actually inheritable,
according to:
http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#Inherited_attributes
3.3.5.6 Inherited Attributes
*Schema Information Set Contribution: Inherited Attributes*
[Definition:] An attribute information item A, whether explicitly
specified in the input information set or defaulted as described in
Attribute Default Value (§3.4.5.1)
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#sic-attrDefault>, is
*potentially inherited* by an element information item E if and only
if *all* of the following are true:
1 A is among the [attributes]
<http://www.w3.org/TR/xml-infoset/#infoitem.element> of one of E's
ancestors.
2 A and E have the same [validation context].
3 *One* of the following is true:
3.1 A is ·attributed to·
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-att-to> an
Attribute Use
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#au> whose
{inheritable}
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#au-inheritable>
= */true/*.
3.2 A is /not/ ·attributed to·
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-att-to> any
Attribute Use
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#au> but A has a
·governing attribute declaration·
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-governing-ad> whose
{inheritable}
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#ad-inheritable>
= */true/*.
If and only if an element information item P is not ·skipped·
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-skipped>
(that is, it is either ·strictly·
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-sva> or
·laxly· <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-lva>
assessed), in the ·post-schema-validation infoset·
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-psvi> each
of P's element information item [children]
<http://www.w3.org/TR/xml-infoset/#infoitem.element> E which is not
·attributed to·
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-att-to> a
*/skip/* Wildcard
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#w>, has a property:
PSVI Contributions for element information items
[inherited attributes]
A list of attribute information items. An attribute information
item A is included if and only if *all* of the following are true:
1 A is ·potentially inherited·
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-p-inherited>
by E.
2 Let O be A's [owner element]
<http://www.w3.org/TR/xml-infoset/#infoitem.attribute>. A does not
have the same expanded name
<http://www.w3.org/TR/2004/REC-xml-names11-20040204/#dt-expname>
as another attribute which is also ·potentially inherited·
<http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-p-inherited>
by E and whose [owner element]
<http://www.w3.org/TR/xml-infoset/#infoitem.attribute> is a
descendant of O.
I presume this is a bug in the OSIS Schema.
From a practical perspective in encoding a whole document, there are
two scenarios to consider:
1) Milestoning structural elements. (BCV: Book, Chapter and Verse encoding)
2) Milestoning verses. (BSP: Book, Section and Paragraph encoding,
recommended)
First the text of the work has to be within (using my notation)
<osis><osisCorpus>(<osisText>(<header>...</header>)*(<titlePage>...</titlePage>)?(<div>*CONTENT*</div>)+</osisText>)+</osis>
or
<osis>(<osisText><header>...</header>(<titlePage>...</titlePage>)?(<div>*CONTENT*</div>)+</osisText>)+</osis>
(Note: osis2mod expects only one osisText)
The significant part is the <div>, it cannot be a milestoned form and
pass validation. The default value of canonical on this element is
"false". Therefore, all descendants not contained in elements whose
default is "true" or that explicitly declare canonical="true" inherit
the value "false".
Because, divs can be nested, each div resets the state of canonical,
either to its default of false or to the declared canonical value.
The fact that <osisText> defaults canonical to true is meaningless. All
of its children have a default of false. So practically speaking, the
only element with canonical="true" is a verse and its contents that
don't have
The other implication of using the non-milestoned form of <div> is that
by OSIS semantic, all other <div>s have to be container elements not
milestoned. (I can quote the OSIS 2.1.1 manual, if needed). Personally,
I think this is too broad a semantic for <div> and should take into
consideration the type attribute.
In case 1), where the document uses the container form for Books (<div
type="book">), <chapter> and <verse> and uses as needed or semantically
required, the milestoned form of other container, the intention of the
OSIS manual is preserved. The defaults work as intended.
However, in case 2), where the verse is milestoned the text and other
elements of the verse is not a child of the verse element but rather the
container that it is in, typically a paragraph or a div. By the rules of
XML (if inheritance were properly specified), the parent container would
need to explicitly give or inherit canonical="true".
With regard to SWORD and JSword, they always work on a fragment of the
whole document and might not have the parent on which to determine
whether canonical is true or false. Practically, they assume true.
If the OSIS schema had the default of canonical on <div> to be true or
if it were optional (making the default on osisText meaningful), there
would be no issue.
This is to say, I think the OSIS Schema has it wrong for a <div>. Until
or unless it is changed, one nearly always has to have canonical="true"
on a div.
In Him,
DM
On Feb 29, 2012, at 2:46 PM, Troy A. Griffitts wrote:
Sorry to only jump in on problems, but...
I don't believe the preceding explanation of 'canonical' is correct.
OSIS defaults many attributes to canonical, including <verse> and
<chapter>
I believe we defined canonical as text belonging to the base work.
For us, this is mostly Bibles.
For a study Bible, it would exclude all commentary and notes, and only
include Biblical text.
Basically, canonical for the Open Scripture Information Standard
refers to Biblical text, and you'd be hardpressed to use it for
anything else practically, though I could see a purist trying to make
an argument for it.
For example, Josephus would only include the text of Josephus.
And while technically true, the practical uses for 'canonical' are
things like:
Showing Psalm titles even when the user has asked not to show 'titles'
Searching typically is only over 'canonical' text
-- but we usually work the opposite way: we take out notes, xrefs,
headings, and index what is left, so the Josephus example isn't
practically a problem for us right now (plus I think our Josephus
module only contains Josephus text). And this is simply for indexed
searching. Our full text searching allows for your to search any of
these other field: notes, xrefs, headings, just about anything in an
entry attribute. We have talked about providing indexed searching for
some of these things, but really? how often do you search the notes?
Just wait the 4 seconds to do the unindexed search. But we have lots
of future ideas of how to modularize the search framework so a
frontend could supply a filter which outputs what to include in a
named lucene index. Anyway, tangent...
Summary,
<verse> already indicates canonical material by default
Psalm titles, being canonical and usually not within a verse (unless
it's a v11n which includes them in a verse), need to be marked
specifically as canonical.
If the OSIS docs say different, let me know and I'll poke the editor.
Troy
On 02/29/2012 07:11 PM, David Haslam wrote:
Thanks DM,
Someone like to volunteer to enhance usfm2osis.pl to ensure that
canonical="true" is set as it should be?
David
--
View this message in context:
http://sword-dev.350566.n4.nabble.com/Setting-canonical-true-tp4432196p4432418.html
Sent from the SWORD Dev mailing list archive at Nabble.com
<http://Nabble.com>.
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
<mailto:sword-devel@crosswire.org>
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
<mailto:sword-devel@crosswire.org>
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page