At 22:09 +1000 2004-08-11, Kahunapule Michael P. Johnson wrote:

The problem I have with OSIS (at least the version of documentation that I have) is that it does not encode enough information to reliably reconstitute quotation mark punctuation for the range of languages and Bible translations that I work with. It doesn't even cover English properly. The reason is that you state in the documentation that quotations should be marked with <q who="Nameofspeaker" sID="someuniquething">....<q who="Nameofspeaker" eID="someuniquething"> and NOT with the quotation marks. This is OK for SOME situations; to wit: standard English texts using the same quotation punctuation rules as the NIV, and Bible texts in languages that happen to use the same characters and rules for quotation marks. This is NOT OK for other situations; to wit: English texts using different quotation mark styles (like the NASB) or no quotation marks at all (like the KJV). It occurs to me that by just ignoring <q> and <speech> altogether, I could put in the normal quotation punctuation for the given language as Unicode characters in the right places and be happy-- except for two things.

It may well be that we all made mistakes in the design of quotation handling in OSIS, but I assure you we considered a much wider range of cases than the English NIV or English. Some of us are of US origin, but even so I don't think we have any monolinguals among us.


There is a real tradeoff here -- are quotation marks conventional ways of marking a discourse phenomenon (let's call it "quotation" to keep things simple), or are they part of "the text"? That is not so straightforward as it seems to me you are suggesting. There were no quotation marks in the original texts of the Bible, so all the quotation marks are products of someone's interpretation.

Nevertheless, we all agree that OSIS markup has to provide enough information to get the formatted result that one wants.

Actually, let me clarify that a little: widow and orphan management is an important part of high-quality formatting: certainly part of "the formatted result that one wants." But surely it shouldn't be part of what OSIS encodes. This may seem obvious or trivial, but I have heard people criticize OSIS for just this: they look at a printed Bible someone produced from OSIS source using some formatting tool that doesn't do widowing well, and say "OSIS can't produce a good Bible" -- we must always keep in mind that there are at least two separate parts involved here: the markup and the engine that processes it.


One is that I want to encode some (but not all) of the Bible texts for "red letter" editions. Actually, I don't really mean to specify that the words of Jesus have to be in red. I just want to mark the direct quotes of Jesus in a way that makes it easy for those who wish to present the Bible text to display the direct quotes of Jesus in red (or some other distinctive way) if they want to. I don't even care if people display Jesus' direct quotes in red or not, but I do care that if they do, the markers are in the right places so that the correct words are marked. I can use <q who="Jesus" sID="book.chapter.verse.0">...<q who="Jesus" eID="book.chapter.verse.0"> for that, but then if I do that for the KJV, will the application reading the OSIS file add quotation marks? If I use OSIS for a language that uses different quotation marks, what will happen? What about open quote reminders at new paragraphs and stanzas? Will they be inserted when they aren't supposed to be?

This is they key point, isn't it? "will the application reading the OSIS file add quotation marks?" is not a question that can be answered. Which application? Reasonable software for formatting XML should do what your style sheets say it should do. Perhaps not all software is reasonable, but even most CSS implementations give you that much control.


Clearly the KJV and the NIV have different styles for quotations. The style sheets you would use to generate printed versions of them therefore would differ. They might be completely separate, or just differ in a few things, or a very clever stylesheet might even check what version it's formatting (by looking at the header) and do the appropriate thing for any version it knows about, and a default thing otherwise.

By not enshrining punctuation in the text itself, a wider range of options are available to the translators, publishers, and other concerned parties. For example, if I were printing an NIV in France for some reason, I might want to use the French chevron-like quotation marks (sorry, I forget the name for them just now). No problem: tweak the stylesheet. You don't have to even touch the touch the text itself -- thus the risk of accidentally messing it up is reduced. This is especially important for minority languages, where the typesetter probably doesn't know the language, and so cannot easily detect if they messed things up.

Also, these source files will be processed by many things other than formatters. Consider blind users with voice-generation interfaces: they won't get quotation marks at all -- but if the system knows there is a quote starting, it should be able to signal that to them. One system might just say "quote" in whatever the user's language is; a better system might generate voice inflections or suprasegmentals of some sort to communicate the same thing. Second, consider a search engine: it shouldn't have to search for a different pattern of specific characters to locate quotes in every language it encounters (especially when some patterns are ambiguous).

So, it seems to me we definitely need to have markup in there for quotes -- the question then is whether OSIS quote markup provides sufficient information to drive a formatter, and if not, what to do about it.


The other problem with controlling quotation punctuation with OSIS and always using markup (i. e. q or speech elements) is that there are not just start and end locations. There are also open quote reminder locations. This gets confusing. Can I specify that a quotation starts at a given location with one character, continues at a paragraph boundary with a different character, then ends with still another character? Would it be OK to use a duplicated sID in a q milestone element to indicate that this is a part of the same quotation, but more punctuation is needed here?

Absolutely agreed. We discussed this at length (Patrick, can we add a section with some examples for this in the doc, if we haven't yet?). Typically, the placement of quotation reminders is determined by some fairly simple rule, that may differ by language, writing system, culture, and genre (and probably other factors too). Your example of a paragraph boundary is a very common case. In such a case, the stylesheet rule for paragraph simply checks whether a quotation is open, and if so, issues the appropriate punctuation.


This is a valuable approach, because there might well be two different groups that share a translation, but live in different areas and have become accustomed to different quotation style rules. For example, a language group from a war-torn country where many have emigrated, and ended up in different countries. If you put the literal quote characters in the text for one group, you have to go and fix it all manually for the other group. If instead you mark the quotes via markup and have a stylesheet generate the correct characters for display, then you just change that stylesheet, getting a uniform change with much less effort.

Does any of us know of a situation where the placement of "reminder" punctuation is discretionary? That is, where we have to record it because there is no rule, or a rule so complex, that the marks cannot reasonably be generated by a stylesheet? (I'm not including making a facsimile edition of a copy text including errors).

In my opinion (and that of my OSIS validation code), it would be incorrect to use a duplicate sID for this case as the OSIS schema stands right now. It could be that there is need to explicitly mark paragraph boundaries inside quotes, rather than letting the style sheet do the right thing. If you believe so, can you explain it to me in more detail? I'm not quite understanding your point here, and I very much want to.

*If* there turns out to be such need, then I see a few simple solutions:

a) Allow additional milestones with the same sID (or possibly eID, but I like your sID notion better)

b) Create a new empty element for the purpose, say <q-continued> or similar

c) Reserve a 'type' attribute value somewhere to distinguish this case.

If there really is need, you can simulate solution b or c right now in OSIS by using a regular milestone and assigning it a special type for this purpose. People (namely, the people writing stylesheets for you or doing typesetting) might complain unless you could show why it is in fact needed -- but if it really is, then it is.



In short, I consider the placement of quotation punctuation and the selection of characters to be used for quotation punctuation to be a part of the Bible translation text itself, and if any encoding, like OSIS, cannot guarantee that these characters are maintained in their original locations, then that encoding is defective.

Wow. That's interesting. Let me see if I understand it right: So if I published an NIV in France (or better, a Francophone country with an English-speaking minority population that wants the NIV), and if I used chevrons for quotation marks, you would say it's a different *translation*, not just a different printing or edition or layout? I must admit I have a hard time accepting that.


As for guaranteeing, no encoding can guarantee the result of applying software to it. For all the encoding knows, the formatter you're using simply throws out all punctuation marks, or even all the text. It seems to me that that doesn't make all encodings defective. There must be some more limited claim you're trying to get at here, but I don't see clearly what it is. Help, please?

It seems to me that the *fact* of something being a quotation is clearly part of the translation text, but that the punctuation marks (or whatever) used to communicate that are part of the formatting, just like the choice of font. I still consider them very important, just as I consider the font choice important (printing a Bible in Comic Sans, or in 5 pt type, would probably be a very bad thing to do); but to me it wouldn't be changing "the text".

Can you explain this further for me if it's central to your point? But it seems to me this is not central -- you just want the quotes right, right? And that doesn't require anywhere near so strong a claim.


Do you see the problem?

I don't think so. Please explain further.


Now, let me suggest at least two possible solutions that are easy to incorporate into the OSIS standard. First, let me explicitly state what I'm trying to accomplish:


1. Preserve the current OPTION in OSIS to generate quotation punctuation with markup.

2. Preserve the OPTION in OSIS to mark quotations by speaker for specialized searches or, in the case of Jesus' direct quotes, to color or present them in some different way.

3. Add the OPTION to control quotation punctuation precisely for languages and styles that differ from the "usual" in the type and placement locations of quotation punctuation.

Suggested solution number 1 (recommended):

Document that any <q> or <speech> element marked with an attribute of n=" " (a blank space) should not be taken as an instruction to insert any quotation mark. Rather, in this case, it should be assumed that the correct punctuation is already in the text as a Unicode character (just like other kinds of punctuation). <q> or <speech> elements not so marked would be taken as an instruction to insert quotation punctuation in the manner that the NIV English Bible does, including open quote reminders, and alternating double and single typographic quotes for nested quotes.

I rather like the idea I perceive here -- some signal that the punctuation is already in the text. The stylesheet could use this in a nicely general way. I don't think it belongs on the 'n' attribute, but that's a minor detail.


Is there a case, though, where a stylesheet couldn't be reasonably expected to generate all the right quotation marks? If a language required a different quotation mark depending on the voicing of the following consonant, or (worse) the gender of the next noun, that would be beyond typical stylesheet mechanisms to do. I don't know of any languages where punctuation choice depends on linguistic phenomena that aren't already represented by other markup or layout (like paragraph breaks). If there are, then we have a clear problem to deal with. But given the historical development of writing systems, that seems to me really unlikely. Anybody know an exception?


Suggested solution number 2:

If for some bizarre reason you are opposed to letting quotation punctuation exist as a normal Unicode character in the text, you could (1) allow the exact character to be used to be specified with its hexadecimal code position in the n attribute of the p or speech element, and (2) define two other elements to specify if open quote reminders are appropriate at new paragraphs and stanzas, and (3) specify what the open quote reminder character should be.

Parts 2 and 3 of this would go in a stylesheet, not in the text; you can do that now. If the character(s) were to go in an attribute, they could just go there -- no need to code in hex. But I don't think there's anything preventing such characters in the text in OSIS now -- so long as you do still mark the quotes (which is surely necessary for most non-printing processing). I'd have to read the fine details of the wording to be certain.



Suggested solution number 3:

Make something up-- anything that solves the problem above, and ask me if I think it would work or not.

See above.


By the way, I would be happy to help you proofread and review the next release of OSIS documentation and schema.

Many thanks! Feedback from people who have actual concrete issues to deal with is *very* valuable.



Hope you are having a great day!

I am. It is about my bed time, now...


--

Steve DeRose -- http://www.derose.net
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: [EMAIL PROTECTED]  or  [EMAIL PROTECTED]
_______________________________________________
sword-devel mailing list
[EMAIL PROTECTED]
http://www.crosswire.org/mailman/listinfo/sword-devel

Reply via email to