Check to see if it is already entered, first.
If it is, perhaps add a comment on it with specific examples.

Here is the basic game plan.
I am extracting the module into books where each book is a supposedly well-formed OSIS fragment. That is each is
<div type="book" osisID="...">
book goes here
</div>
I am then going to run this through a SAX parser to identify the books that are not well-formed. For those that are not well-formed, I am hoping to do basic cleanup via perl to get them to be well-formed. This will handle obvious global edits like <note .... />...</note> becomes <note ....>...</note>. If any book is too big, I'll subdivide it into chapters and work at the chapter level.

Once all the books are well formed, then I plan to validate against the OSIS 2.1 schema.
For obvious global edits, I'll probably do them in perl.

Once this is done, I'll check everything into SVN, book by book, with long ones perhaps subdivided into chapters.
And open it up for help.

Right now, I have it split out by book and have started to write some basic validation tools. I may be done with that by the start of next week.

Martin Gruner wrote:
Lynn,

thanks for pointing this out.
To be sure that it won't be forgotten, please file a bug at crosswire.org/bugs for the "modules" project.

mg

Am Donnerstag, 23. Februar 2006 18:10 schrieb L.Allan-pbio:
I know zilch Greek or Hebrew, but could perhaps help with cleaning up
the redundant/flawed tags in KJV .... there is a verse that is over
10,000 chars long, (Mark 1:9?) and several over 4,000 tags long.
Stay tuned. With Troy's help, I should have the work area set up before
too long with the KJV by book, perhaps chapter.
I took a look at the KJV rawtext from the compressed module, and found 208
verses whose length is over 2500 characters. All of these are in the NT.
There are over 900 NT verses that are over 2000 chars long.

Not sure if this helps, but here is a link:
htpp://lcdbible.sf.net/misc/VeryLongKjvVerses_2500.zip

Mark 1:9 is over 15,000 characters, and something is clearly incorrect. The
pattern "w src morph" is repeated about 1000 times within the same verse:
# 36: BCV= Mark  1:9   Len:15329
<w src="1" lemma="x-Strongs:G2532" morph="x-Robinson:CONJ">And</w>
<w src="2" lemma="x-Strongs:G1096" morph="x-Robinson:V-2ADI-3S">it came to
pass</w>
<w src="3" lemma="x-Strongs:G1722" morph="x-Robinson:PREP">in</w>
<w src="4" lemma="x-Strongs:G1565" morph="x-Robinson:D-DPF">those</w>
<w src="6" lemma="x-Strongs:G2250" morph="x-Robinson:N-DPF">days</w>, that
<w src="8" lemma="x-Strongs:G2424" morph="x-Robinson:N-NSM">Jesus</w>
<w src="7" lemma="x-Strongs:G2064" morph="x-Robinson:V-2AAI-3S">came</w>
<w src="9" lemma="x-Strongs:G575" morph="x-Robinson:PREP">from</w>
<w src="10" lemma="x-Strongs:G3478" morph="x-Robinson:N-PRI">Nazareth</w>
<w src="12" lemma="x-Strongs:G1056" morph="x-Robinson:N-GSF">of
Galilee</w>, <w src="13" lemma="x-Strongs:G2532"
morph="x-Robinson:CONJ">and</w> <w src="14" lemma="x-Strongs:G907"
morph="x-Robinson:V-API-3S">was baptized</w>
<w src="15" lemma="x-Strongs:G5259" morph="x-Robinson:PREP">of</w>
<w src="16" lemma="x-Strongs:G2491" morph="x-Robinson:N-GSM">John</w>
<w src="17" lemma="x-Strongs:G1519" morph="x-Robinson:PREP">in</w> <w src
morph w src morph w src morph w
src morph w src w src morph w src morph w src morph w src morph w
w src morph w src morph w src morph w src morph w src w src morph w src
morph w src morph w src morph w src
w src morph w src morph w src morph w src morph w src w src morph w src
morph w src morph w src morph w src
***********  repeats ************
***********  about   ************
***********  300      ************
***********  lines     ************
w src morph w src morph w src morph w src morph w src w src morph w src
morph w src morph w src morph w src="20" w src morph w src morph w src
morph w src morph="x-Robinson:N-ASM" lemma="x-Strongs:G2446">
<w src="19" lemma="x-Strongs:G2446"
morph="x-Robinson:N-ASM">Jordan</w></w>. <w src="5" lemma="x-Strongs:G3588"
morph="x-Robinson:T-DPF"></w>
<w src="11" lemma="x-Strongs:G3588" morph="x-Robinson:T-GSF"></w>
<w src="18" lemma="x-Strongs:G3588" morph="x-Robinson:T-ASM"></w><resp
type="strongsMarkup" name="rkr" date="2002-11-30-21:45"/>

I noticed there was very significant repetition of "x-Strongs:G3588" in a
lot of verses, but I don't understand enough about osis markup to know if
that is an error. Here is an example:
#  2: BCV= Matthew  2:13  Len: 3012
<w src="38" lemma="x-Strongs:G3588" morph="x-Robinson:T-GSM"></w>
<w src="36" lemma="x-Strongs:G3588" morph="x-Robinson:T-ASN"></w>
<w src="18" lemma="x-Strongs:G3588" morph="x-Robinson:T-ASF"></w>
<w src="15" lemma="x-Strongs:G3588" morph="x-Robinson:T-ASN"></w>
<w src="10" lemma="x-Strongs:G3588" morph="x-Robinson:T-DSM"></w>
<w src="2" lemma="x-Strongs:G1161" morph="x-Robinson:CONJ">And when</w>
<w src="3" lemma="x-Strongs:G846" morph="x-Robinson:P-GPM">they</w>
<w src="1" lemma="x-Strongs:G402" morph="x-Robinson:V-AAP-GPM">were
departed,</w>
<w src="4" lemma="x-Strongs:G2400" morph="x-Robinson:V-2AAM-2S">behold,</w>
<w src="5" lemma="x-Strongs:G32" morph="x-Robinson:N-NSM">the angel</w>
<w src="6" lemma="x-Strongs:G2962" morph="x-Robinson:N-GSM">of the Lord</w>
<w src="7" lemma="x-Strongs:G5316"
morph="x-Robinson:V-PEI-3S">appeareth</w> <w src="11"
lemma="x-Strongs:G2501" morph="x-Robinson:N-PRI">to Joseph</w> <w src="8"
lemma="x-Strongs:G2596" morph="x-Robinson:PREP">in</w>
<w src="9" lemma="x-Strongs:G3677" morph="x-Robinson:N-OI">a dream,</w>
<w src="12" lemma="x-Strongs:G3004"
morph="x-Robinson:V-PAP-NSM">saying,</w> <w src="13"
lemma="x-Strongs:G1453" morph="x-Robinson:V-APP-NSM">Arise,</w> <w src="14"
lemma="x-Strongs:G3880" morph="x-Robinson:V-2AAM-2S">and take</w>
<w src="16" lemma="x-Strongs:G3813" morph="x-Robinson:N-ASN">the young
child</w>
<w src="17" lemma="x-Strongs:G2532" morph="x-Robinson:CONJ">and</w>
<w src="20" lemma="x-Strongs:G846" morph="x-Robinson:P-GSM">his</w>
<w src="19" lemma="x-Strongs:G3384" morph="x-Robinson:N-ASF">mother,</w>
<w src="21" lemma="x-Strongs:G2532" morph="x-Robinson:CONJ">and</w>
<w src="22" lemma="x-Strongs:G5343" morph="x-Robinson:V-PAM-2S">flee</w>
<w src="23" lemma="x-Strongs:G1519" morph="x-Robinson:PREP">into</w>
<w src="24" lemma="x-Strongs:G125" morph="x-Robinson:N-ASF">Egypt,</w>
<w src="25" lemma="x-Strongs:G2532" morph="x-Robinson:CONJ">and</w>
<w src="26" lemma="x-Strongs:G2468" morph="x-Robinson:V-PXM-2S">be thou</w>
<w src="27" lemma="x-Strongs:G1563" morph="x-Robinson:ADV">there</w>
<w src="28" lemma="x-Strongs:G2193" morph="x-Robinson:CONJ">until</w>
<w src="29" lemma="x-Strongs:G302" morph="x-Robinson:PRT"></w>
<w src="30" lemma="x-Strongs:G2036" morph="x-Robinson:V-2AAS-1S">I
bring</w> <w src="31" lemma="x-Strongs:G4671"
morph="x-Robinson:P-2DS">thee</w> <w src="30" lemma="x-Strongs:G2036"
morph="x-Robinson:V-2AAS-1S"
splitID="41">word:</w>
<w src="33" lemma="x-Strongs:G1063" morph="x-Robinson:CONJ">for</w>
<w src="34" lemma="x-Strongs:G2264" morph="x-Robinson:N-NSM">Herod</w>
<w src="32" lemma="x-Strongs:G3195" morph="x-Robinson:V-PAI-3S">will</w>
<w src="35" lemma="x-Strongs:G2212" morph="x-Robinson:V-PAN">seek</w>
<w src="37" lemma="x-Strongs:G3813" morph="x-Robinson:N-ASN">the young
child</w>
<w src="39" lemma="x-Strongs:G622" morph="x-Robinson:V-AAN">to destroy</w>
<w src="40" lemma="x-Strongs:G846" morph="x-Robinson:P-ASN">him.</w><resp
type="strongsMarkup" name="pdy" date="2003-12-14-08:43"/>


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to