-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 At 02:25 20-03-04, Todd Tillinghast wrote: >Michael, > >I am trying to understand why you think by putting quote marks "in >the >text" rather than in an attribute makes the quote mark any more or >less >a part of the "Bible text".
Here is the answer. Putting the quote mark in the text is a reversible, lossless encoding for all languages and styles. Putting the quote mark in an attribute of a <q ...> element is NOT lossless encoding for all languages and styles. Lossless encoding is like this. I lend you US$100.00 and you write down a note to yourself to repay me US$100.00 on payday and you don't lose the note. On payday, you read the note, and you hand me back exactly US$100.00. Lossy encoding is like this. I lend you US$100.00 and you write down a note to yourself to repay me $100.00 on payday. On payday, you give me back HK$100.00. Since one Hong Kong dollar is worth $0.1283, you shortchanged me US$87.17. Losslessly encoding and decoding "The 'hasty' brown <<fox>> jumped over the 'soporific' dog's backside." yields "The 'hasty' brown <<fox>> jumped over the 'soporific' dog's backside." Lossy encoding and decoding "The 'hasty' brown <<fox>> jumped over the 'soporific' dog's backside." might yield "The hasty brown fox jumped over the soporific dogs backside." or it might yield "The <<hasty>> brown 'fox' jumped over the <<soporific>> dog>s backside." or it might yield "The 'hasty' brown fox jumped over the 'soporific' dog's backside." depending on the parameters used in a particular instance. It might accidentally give me back the same string I started with. Consider, please, the following situation. I desire to encode many different Bible translations, in many different languages. Among these are languages which use a different rules and different characters for punctuation marks. Some of them use opening and closing quotation marks, and some don't. Some use different punctuation than you use in English. Some change the way things are punctuated inside of the quotation, and some don't. Some require "reminder" marks of various sorts at differing places within the quotation. Some have different ways to indicate quotations which carry subtle meanings themselves. I want you to guarantee that I can losslessly encode and decode each and every one of these translations with "standard" processes, with the punctuation always put in the right place in the rendered text. I want you to do this without requiring me to supply any additional information that is not in the OSIS document. If I never use <q ...> or <speech ...>, and always put the punctuation in place with glyphs representing the correct punctuation in the text exactly the same way that I would put any other punctuation or alphabetic characters in the text, then I can be assured of that working. I can be assured, that is, unless some unthoughtful person alters my text by trying to follow your bad recommendation to replace all quotation punctuation with <q ...> or <speech ...> elements in cases where the punctuation conventions differ from the English of the NIV. The <q ...> and <speech ...> elements are never required to correctly render the text, if all of the punctuation, including quotation marks, is included in the text in the same manner. Let me see if I can correctly explain to you why you don't want to do a proper lossless encoding of quotation punctuation, and then I will propose a solution for both points of view. First of all, if all texts use the same quotation punctuation rules as the NIV, which can (with a few possible exceptions) be automatically and accurately generated from <q ...> and <speech ...> elements without n attributes. Therefore, you are probably thinking that doing such generation is really effectively lossless most of the time (by luck and not by design). Most of your "customers" would probably think so, too. After all, it is only languages you don't speak or read that need different rules, and a few "odd" English translations such as the NASB, ASV, KJV, etc., that don't fit your mold. You summarily shrug those off by saying that the publisher and renderer must somehow specify these "exceptions" with some kind of rendering style information. Therefore, this doesn't seem like a big deal to you. (For me, it is a huge issue that will cause me to accept or reject OSIS altogether, depending on your response, but I can understand that you might not think it is important at all.) Marking quotations with XML markup to indicate when we are in a quotation and who is being quoted allows the process reading the OSIS text to "know" when something is a quote or not, and possibly who is being quoted. This information can be used as part of the criteria of a search, or maybe to influence rendering (i. e. for a red letter edition). Allowing the markup used to indicate quotations to also generate punctuation according to NIV rules also makes life easier for people working in translations that actually use those rules, because they don't have to remember to put in the open quote reminders at the beginnings of paragraphs. I acknowledge those advantages, and do not wish to deprive you of them. However, I insist that you not neglect my favorite "odd" cases. If you allow the XML markup to specify opening or closing of quotations in such a way that the creator of the OSIS document can specify that quotation punctuation be generated or not from the markup, then you could still enhance properly punctuated text for enhanced search capabilities and rendering red letter editions without messing up the punctuation, even if the punctuation used is not NIV English standard. In fact, the same rules would work on NIV English standard text, too. Take your pick. Either works, with no loss of capabilities either way. Only when you deviate from NIV English rules do the advantages of the total separation of punctuation generation from quotation markup become clear. You actually proposed an acceptable solution (using n=""), but you keep trying to tell me that is bad to do for reasons that are not convincing or even logical. >If I were to encode a Bible at the character level as follows: ><verse osisID="Gen.1.1"><c value='I'/><c value='n'/><space/><c >value='t'/><c value='h'/><c value='e'/>...</verse> > >vs > ><verse osisID="Gen.1.1">In the...</verse> > >Are the characters "In the" any more or less a part of the encoding >either way? No, but the first encoding is exceedingly inefficient and ugly. It reminds me of HTML email messages generated by spammers. Nevertheless, such ugly and inefficient encoding can be lossless. >By using XML you MUST entities for some characters (<, >, /, ...). >These are not plain text but rather a place holder for those >characters. Fine. Those encodings are lossless. They are not a problem. >Most encoders are satisfied to logically represent the start and end >quote marks with the <q> element it self and let the rendering >process >choose the glyph to be rendered. I am not "most encoders," but I am content to let them do what they want. Let them trust the rendering process to insert the correct punctuation if they are using NIV English rules. > The point you bring is that there are >cases where this is not sufficient, because not all the information >the >translator intended can be represented with this more simplistic >model. Correct. >What I suggested with the use of the "n" attribute was that rather >than >simply encoding a <q> element that records the start and end of a >quote >(and having that character to render be up to the rendering process), >we >could also allow the option for the encoder to specify that a >specific >character should be used rather than leaving it up to the rendering >process. That is a small step in the long journey, but a step in the right direction. You still haven't dealt with open quote reminders within a quote. To do that unambiguously, you would have to insert additional markup at the points of insertion, and then you would be back to something that looks kind of like your lame example of encoding one letter per XML element. >The thing that is troubling with <q n="" sID="uniqueID"/>"text >text"<q >n="" eID="uniqueID"/> is that you have said that there is a quote >that >has no punctuation to delimit and that within that quote there is a >character ["] that is simply a character and DOES NOT carry the >meaning >that a quote is starting or ending but rather that there is a word >["text] at the first of the quote and another word [text"] at the end >of >the quote. I interpret the same example slightly differently than you, and I see no contradiction or troubling features to it at all. The <q n="" sID="uniqueID"/> tells the reading process that this is the beginning of a quotation and that it is not permitted to insert any punctuation because of this opening of a quote-- not here, and not at the beginning of any paragraph within the quote. The opening and closing quotation marks surrounding "text text" are not for the computer's benefit. They are for the benefit of the people reading the text in their own language. The correct punctuation may well not be the double quotes used, or the typographic versions of the same, but may be Unicode U+00AB and U+00BB or some other marks, but they are there for the benefit of the human reader, not the computer. If the computer process thinks that the punctuation is part of a word next to it, that doesn't matter if all it needs to know about words is that it must not break words in two at line boundaries. Of course, a more intelligent reading process could recognize that it is punctuation from a Unicode database, and separate it from the word, so that it can accurately compile a concordance of the words in the current Bible translation. This is not just something that has to do with quotation marks, but all punctuation. Of course, the <q n="" eID="uniqueID"/> tells the reading process that the quote has ended, and don't put in any punctuation in honor of this event. In this case, the whole point of the <q ...> markup is to "tell" the computer, not the people, where a quote is. In this particular case, the <q ...> markup is probably pointless unless you add who="whoever" parameters and use this feature for searching the Scriptures by speaker, or in the case of Jesus' direct quotes, rendering a "red letter" edition. In short, if you fully support lossless encoding of less common cases (many of which are in my bookshelf and in archives in SFM files where I work) without denigrating or depreciating that solution in any way, then I will be able to continue to support and use OSIS. If not, I have alternatives that I will use instead. Did I mention that I want lossless encoding for any Bible translation for any living, written language on earth? I do. I will not accept anything less. Not one jot or tittle may go missing or be inserted where it does not belong. Your friend and adversary in this iron-sharpening contest, Michael former OSIS supporter -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: http://eBible.org/mpj/gpg.htm iD8DBQFAW+I1RI/gxxfXR7sRAvvqAJ9trjkbCKeVl7WwdSmiTVGux2xyEQCgxBY6 KnjW9Hq53UJt8vTO0OCH5EM= =vBud -----END PGP SIGNATURE----- _______________________________________________ sword-devel mailing list [EMAIL PROTECTED] http://www.crosswire.org/mailman/listinfo/sword-devel
