Ted Walther wrote:
On Sat, Mar 25, 2006 at 06:45:58PM -0700, Troy A. Griffitts wrote:
<w lemma="strong:1">word1 word2</w> <w lemma="strong:2">word3 word4 word5</w> <w lemma="strong:3>word6</w> <w lemma="strong:4>word7 word8 word8</w>

Most printed Bibles with Strong's numbers merely insert numbers into
the text, imply the previous word or some number of words are related
to that number.  Our NT human tagging allowed us to be exact, even
non-contiguous.  We don't have this level of markup in the OT.

Dude, I'm so excited about all this work you're putting into this data!
I'm sure so many projects (inside and outside of CrossWire) will be
blessed by this!

Indeed.  I was just getting my project kicked off based on the KJV2003
when I noticed some problems with the Strong's number markup in the OT.
I don't really want to delay my project, but if KJV2006 is less than a
few months away, I can wait.

The next beta release should be the last one with no changes until the final release. Look for an announcement of the final beta "any day now." You can get the current work at www.crosswire.org/~dmsmith/kjv2006.

When it is released really depends upon the ability of the windows version of SWORD to handle it. There are 4 software changes that need to be made before it is released. 3 are in the SWORD api and one is in osis2mod. These changes are being worked by my guess is that they will be completed after the Spring semester, which is soon.

If these are not changed in the code, then I can use xslt to transform the master document into one that works around these problems.


Really, I don't think the connective words like "And the" should be
included as part of the Strong's number.  They should be outside.

The approach has been fairly simple, the KJV uses italics in the printed copies to indicate what was added to the Greek or Hebrew. These are marked with <transChange>And the</transChange>. The remainder of the words are understood to be translation from the Greek and Hebrew. To that extent they should be surrounded with strongs numbers.

There are some empty strongs numbers as not everything in the original was needed to be translated.

As Troy noted the OT was programmatically tagged with the strong number that fell at a point in the text surrounded everything from the previous number that was not italic.

The NT was not programmatically tagged but done by people using a software tool. The result in the NT is that verse by verse every strongs number in the TR is present in the KJV NT. The empty ones are not necessarily at a good location.

All this to say, fixing the tagging is a manual, analytical exercise. It should be done, but is outside the scope of this effort.


I've noticed some verses have the first word not surrounded by
appropriate tags giving the strongs number, yet other words are labelled
as "NIH" which is very convenient.  Could we have all words put inside
tags like that for easier parsing?
At this time, the established goals of this effort have been reached. If there are specific, identified mistakes we can fix those, up until the release. However there are other things that can be done that have not been done. Any other changes will be for a following effort. If there is a clear algorithm that can be applied, that does not swap one set of problems with another, I think it would make sense to make that change.

I will be checking the final beta into SVN and then it will be open to fixes such as these. When enough have accumulated then we can re-release. (At least that is my thought)


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to