Re: [sword-devel] the future of OSIS support (importer/filters)

DM Smith Wed, 27 Apr 2005 05:08:28 -0700

Just a couple of comments so most of the thread is stripped. Also, some of this is really more a question for OSIS. Chris, hopefully you can pass it along, if appropriate.

Chris Little wrote:

DM Smith wrote:
I agree that support should be limited to 2.0. Or perhaps 2.1, if it is pretty near completion. At the OSIS website, you cannot find documentation for prior versions. This makes it difficult to manage an earlier version of OSIS. Also, 2.0 is a significant improvement that it should be enough motivation to cut.
I think 2.1 is pretty stable and it may be a while before any of this particular suggestion really gets implemented, so my meaning is really that we should adopt whatever is current at the time. In any case, for our purposes 2.0 and 2.1 are virtually identical.

I also compared features and it is mostly unchanged. But the differences are significant to frontends. On the <hi> element type has been replaced by rend as the attribute to hold bold, italic, etc.

I am suggesting that we don't create a need to know the OSIS version number for a while. If sword has modules that are encoded according to the most recent OSIS then we may have modules that are use every version of OSIS. If sword instead says that modules are 2.0 and then when OSIS has changed significantly (say 2.7) sword says that 2.7 is now used for new modules, this will create an easier upgrade path for frontends.

<snip/>

Verse numbers are not necessarily a single digit and do not necessarily flow in numerical order. Encoding <verse> elements (along with their n attributes, when present) permits us to render lettered verses and range verses easily. It affords us the possibility of rendering out-of-order verses (though this will require some additional thinking/work). And until multiple versifications are actually supported, it allows us to fake them.
I am not sure what you are thinking, but I don't think it will work. The verse (start/length) index will point to the verse as it is in its order, not by its number. Or it will be massaged to refer to the verse by its number and not its order. Unless more information is added to the index (i.e. what the verse actually is, which at this time is implicit by its offset into the index), this will lead to inconsistencies. We have discussed these at great length here so I won't repeat them again.

<snip/>

Until then, however, we store non-canonical verses in the previous canonical verse. If we had verse elements (and chapter too, in the case of Ps.151), we could at least render these more attractively. As it is, they just like a single (big) verse, without verse numbers. Like I said, it's basically faked, since you can't actually reference the individual non-canonical verses (that's part of the v11n work). But rendering a readable well Bible is an improvement over the current situation.


For others, canonical simply means that which is not described by canon.h.

So, where do you break a verse? Is everything between verses included by the following verse? What about material before the first verse in a chapter/book or work? (i.e. do we actually support introductory material and if so, how is it delineated?)
Yes, material preceding a verse goes in the verse that follows the material. The exception is the first verse of a chapter. Material preceding the first verse of a chapter goes in the chapter intro. Material preceding a chapter element goes in the book intro.

Should the algorithm look for special "stuff" say <title> that stands before the first verse? I don't think that this necessarily belongs in an intro.

And I don't understand why introductory material for the minor prophets is added to the intro of Isa, but if it stand in front a <chapter> that it goes into the book intro. That seems to take it way out of the orderly flow. Isn't this akin to a title that stands before an element belonging with that element? <snip/>

All this is already supported by the API. Introductions have always been part of Sword modules. How frontends support it is not my business, but it would be best if they rendered it properly. :D


It's on our list of things to do :)

We also have the option of normalizing OSIS to a form of our choosing. Towards that end, we CAN require that all book/chapter/verse tags be milestones.
You have already noted that some OSIS container elements are not milestoneable. For any OSIS work with significant structural markup, these will result in milestones being used for verses, likely for chapters and possibly for book (though I am not aware of any instance of structure crossing a book boundary.)
I don't think anything crosses book boundaries, either, so we /could/ permit container book divs. Likewise, we could probably force chapters to be well-formed XML. There's really only one place (Rev.12-Rev.13) where paragraphs ever cross a chapter division. Arguably, q does at some points (but q will often be milestoned). So we could normalize containers that cross chapters as milestones, if that helps anyone and provided there are no negative consequences anyone can think of.

Using milestones for divs would help verse at a time systems since it is designed to be one of the largest containers.

From earlier threads on quotes, there are several quote markers that need to be handled. Block vs inline quotes. (The <q> tag is used for both, but it is not clear when to render one or the other. These are structural elements, not simply rendering issues. Does OSIS define a mechanism for this?)
Block quotes need to have type="block" set.

OSIS 2.01 and 2.1 does not document this. 2.01 really only has a placeholder for describing the element. 2.1 goes on at great length. However, it looks as if they are still thinking it through. Their suggestion is to use type of initial|medial|final to indicate whether a quote mark is an initial, continuation or final one.

It seems that type should be block|inline and sub-type should be initial|medial|final, as this would allow for both inline and blockquotes to contain nested and interrupted quotes.

Also, the notion of medial is interesting, this argues for a quote element that is neither a begin sID element or a end eID element, but something else. Since the sID and eID are paired with the same value, is there a need for a mID with the same value?

Beginning quote mark, continuing quote mark, end quote mark, nested begin/continue and end quote marks, and nested with in nested quote marks. (I consider this to be a structural issue. Notice, there is no mention of the actual marks that are used.)
Nesting can be specified by the level attribute. Which mark is used is supposed to be a style-sheet issue, hence my suggestion that we handle it in .confs. However, there is also the n attribute, where you can put the rendered form of the quotation mark, I believe. (I forget, but we might have also talked about adding a rend attribute to serve this purpose instead.)

The level and the n attributes are not documented in either the 2.01 or the 2.1 manual. But I think that using the level attribute to indicate the depth of nesting is sufficient. And having a rend attribute hold the marker as provided by the publisher is an excellent idea. (I like rend better than n as n is used by other xml systems to be a numbering scheme, e.g. (pretend example) <br n="3" /> means three line breaks.)

<snip/>

Can we include information on the <q> element concerning the kind of quote mark that is used? (I don't mean the actual mark)
I presume we would define something like level 1, 2, ... n marks that begin & end a quotation and that mark both sides of a break in quotation (according to what a language requires). English, for example would need levels 1 & 2, beginning, continuation beginning, and end--6 marks total (level = level modulo 2). So if you hit a tag that reads <q eID="..." level="2"/>, you know to render a single 9 quotation mark.

We could do this on a per-translation basis or a per-language basis and we could allow switching based on locale or user preference.

If n/rend is used to indicate the original marker, then we don't need to change the conf for this. Locale files could/should be used to hold the quotation system.

<snip/>

2) Both have references to other entries. In the case of Strongs, it will refer from Strongs Greek to Strongs hebrew as well as internally. When I tackle Naves, I want to be able to create an internal cross referencing as well as a referencing to verses.
We should probably make the Greek & Hebrew versions a single module. The current modules are based on databases intended for OLB, so they just have numbers for keys (four digit numbers plus a leading 0 in the source for Hebrew words). A better way to do this is with a leading G or H in the key (osisID). That's how Strong's numbers are referenced in OSIS modules, for example.

The G/H is needed since the numbers overlap. Merging them into one module is a great idea, but it will require some front ends to change (i.e. BibleDesktop) since they vector the reference to a particular module.

Also, it would be good if the transliteration were changed to the original script.

Anyway, your question is really about cross-referencing. The correct way to do that is with the reference element. Internal cross-referencing we can probably handle pretty easily. <reference osisRef="Moses">Moses</reference> would be used to create a reference to the Moses entry in the same document (technically, whatever element has osisID="Moses"). Frontends don't support this (to my knowledge), but that's how it's supposed to be encoded.

In OSIS, what distinguishes an internal reference from a bible verse reference?

<snip/>
_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] the future of OSIS support (importer/filters)

Reply via email to