DM Smith wrote:
I looked again at the OSIS website and could not find that verse with milestones is the best practice. I think I was able to figure out why it would be a necessary practice. It is mentioned that if any OSIS container element is used in the milestone form then that element must always use the milestone element in the entire work.
I don't find anything either, but trust me that this was the effect of our decisions. Book/section/paragraph (BSP) is primary. That is the best practice. Book/chapter/verse (BCV) is secondary and overlays BSP. BCV doesn't identify linguistically significant or linguistically motivated segmentation. It is of essentially historical importance and is used because it is a widely accepted system today, in spite of many known flaws. BSP is based on linguistically motivated segmentation. It's also the system that most of the user base from Bible societies & publishing use. So... that's a little of the reasoning behind why BSP was chosen over BCV.
You should really avoid milestoning elements in the BSP hierarchy (in other words, <div> and <p>, though the latter isn't milestoneable). However, elements that sometimes cross these boundaries include things like <chapter> and <verse>. So, in effect, you have to use milestones for <verse> (which crosses <p> boundaries quite frequently). You can probably get away with using a container <chapter> in many Bibles since translators/publishers go out of their way to avoid things like paragraphs that cross chapter boundaries. (However, you might need to use milestoned <chapter> if you use container <q>.)
Help me if I am missing something here:
If a Bible has rich markup, then there will be a need for milestones. Lets take <q> and <verse> overlapping as in <q>...<verse>... </q>...</verse>
1) Milestones are used for <verse> and not for <q>.
2) Milestones are used for <q> and not for <verse>.
3) Milestones are used for <q> and <verse>.
Actually, you've got me confused below, unless you mixed up 1 and 2. My confusion is with the above for 2 saying <verse> is not milestoned, but 2 below says it would have to be.
If 1 is chosen then it will have the most likely side effect of requiring most, if not all other containers to be milestoned. This means: abbr, closer, div, foreign, l, lg, q, salute, seg, signed, and speech. It will be easier to use milestones for all of them unless one is certain that verses will never be split by one.
I don't think <q> would ever cross the boundaries of abbr, closer, foreign, salute, or signed.
If 2 is chosen then it is likely that only verse and possibly chapter will need to be milestoned. So I can see why this may be the best practice. Also, the OSIS manual notes that pretty much the only practical consequence of a verse element is the rendering of a verse number. And of course Sword will use it to mark the start and the length of the verse in the module.
3 is the easiest to adhere to the OSIS rule of consistency in milestoning an element in a work.
When I encode, I use milestones for <verse> and <q>. I use them for <verse> because some other people decided it would be the best practice and because it simplifies things tremendously to make this non-linguistic unit cross linguistic unit boundaries. And I use them for <q> because the primary use of <q> is for rendering quotation marks and because I consider elements like <l> more improtant to maintain as containers. But it is really the encoder's choice.
Of the elements that can contain a verse, at least one, <p>, is not milestoneable. So, if a verse ever crosses one of these then using milestones for verses is a must. What is not clear from the schema is which container elements that can contain verses can hold part of a verse. For example, I don't imagine that <cell> or <item> should. <p> is specifically mentioned in the OSIS manual as allowing verses to be split.
In theory, there is no reason why a verse boundary could not occur within a <cell> or <item> element. In practice, I can't think of a time when it does. Most instances of <cell> and <item> that I have seen in Bibles occurred in a way that contained the element entirely within a <verse>.
With regard to the Sword API, it is possible to get a single verse. If the verse has an an element end tag and not its begin or a begin element and not its end, i.e. it is not well formed, then an XML parse of that verse will fail. OSIS does not require that a verse be well-formed. Does Sword in making a module from OSIS ensure that each verse is well formed?
If not, then how should it be handled?
No. There is no guarantee that a verse will contain an end tag matching every start tag it contains or a start tag matching every end tag it contains. The importers give you almost exactly what the document contains.
Troy has some practical ideas for how to deal with this.
--Chris _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page