Re: [sword-devel] verse parsing

DM Smith Sat, 25 Mar 2006 20:01:23 -0800

Troy,
My 2 cents.

I see this as a mapping of an external name (i.e. what the user know theverse as) and an internal name (what the engine knows the verse as).As you pointed out there are a whole host of issues with taking userinput and deciphering it into the internal name. Especially when youallow pretty sophisticated ranges and have taught the user that we canguess what they mean with great accuracy.


I think that there are two basic functions that the engine needs to provide:
   translation from the user input into a verse key

translation from a verse key to a external representation that isappropriate for the work.

Today with the KJV, we number each book from Gen to Rev, in the orderthat it appears in the KJV.Then we know how many chapters are in each book and how many verses arein each book.

We also know the ordinal value for each verse or can compute it readily.

But today, we assume that the internal and external names for chaptersand verse are the same. In your example, they are not the same forverses. Everything is fine until verse 31, but after that we have 31a,32, 33, 33a, 33b, and 33c. Positionally these are 32, 33, 34, 35, 36,and 37.

What is needed is a mapping function External <=> Internal, 31a <=> 32, ....

I think you have pointed out that all the UIs will need to change nomatter what. If that is the case, then perhaps you can increase thenumber of methods in VerseKey.

Essentially you would add
String getVerseName()
String getChapterName()

so
int x = vk.Verse()
would give 37 for 33c. That is the offset from the beginning of the chapter.
char* s = vk.getVerseName()
would give 33c.

As to parsing ranges, I think you may need another algorithm. Thecurrent one assumes a lot about KJV versification and its traditions,such as using roman numerals as in II Sam 2:1 and using numbers forchapters and verses.

Having fixed bugs in JSword's parsing of user input, I know howdifficult it is.

As the following is allowed
B[[.C].V][-([[B.]C.]V] | B.[C[.V]]])

where B, C and V stand for book, chapter and verse respectively. And -represents the set of all allowable range indicators and . representsthe set of allowable part separators (and the separators between B & Cmay be different than between C & V).

(I think this is at least close)

And as a shortcut "ff" is allowed, which means to the end of the parentunit.

This becomes difficult with multipart book names and book names thatbegin with numbers and where those numbers can be roman numerals ordigits. As these cause the code to have to do a look ahead or lookbehind to determine whether it is the prefix or suffix of a book name.

If the code is going to be generally useful for a BCV kind of scheme,where C and V may not be integers, then it will require a new algorithm.So, when it is not KJV, it uses the newer one.

I would suggest that we add a V11N= key to the conf with the default ofKJV. This could be used to get the appropriate algorithm.


Hope this helps,
   DM

P.S. My solution from the other day skirted this issue.


Troy A. Griffitts wrote:

Hey guys (especially frontend writers),
I've been working on providing a VerseKey key interface fortraversing modules like the LXXM:
http://crosswire.org/study/bookdisplay.jsp?mod=LXXM&gbsEntry=%2FJoshB%2F24%2F1
I'm having some difficulty fitting this into the exposed VerseKeyinterface.
Obviously, my goal was to save everyone as much modification aspossible, but there just doesn't seem like there is a good fit formodules like these.
Here's a little background of what I was trying and were I raninto troubles, and why I've come to this conclusion:
First, I attempted to redo this module using OSIS book names foreverything, and discovered that there just wasn't a nice book list wecould display to the user. For example, JoshB (from the link above)seems to be the standard book of Joshua we'd all expect, but thenJoshA (browse to it using the left index) contains 3 chapters: 15, 18,19 Not sure exactly what these are, but I'm guessing they arereplacements or additions to Joshua or some other book. Actually, Ijust have no idea.
The next thing I began to realize is that this module uses a,b,c typesuffixes on verses (click on the first link in this email again andscroll to the bottom of the page). This does not fit nicely into ourinteger concept for verses. I considered adding a 5th level:Testament/Book/Chapter/Verse/Sub. But this really breaks the wholeparadigm anyway, as sub will mostly be blank except when there mightbe a letter tacked to the end. It really doesn't solve any problems,e.g. key.Verse(key.Verse()+1) still will break. key++ would work, Iguess, but you'd have to always check if Sub was set to anything. Andwho knows what Sub really means. Is it a replacement? Is it really asubdivision of the verse? It just doesn't seem like it solves anyproblems nicely. It seems like the LXX really is sequentially 31,31a, 32, 33, 33a, 33b. When I know that other Bibles and commentariesmean the first part of 33 when they say 33a. So adding Sub doesn'tseem like it gives us much except keeping Verse an integer.
    So, I have a few ideas, and would like to hear from you.
Basically, I think the way we present and display the LXXM withswordweb (the link above) is actually pretty ok. There are a fewdeficiencies:
The 'reference' is display like:

/JoshB/24/1
We could add a flag which says to display using a BK CH:VS format. Iwas thinking about adding a pattern, like letting the modules.conffile specify something like:
KeyDisplay=%1 %2:%3
but I think this is more work for everyone than it benefits. Besides,other languages probably prefer other formats (BK CH.VS). So I thinkwe'd like to just say something like KeyFormat=BCV
The other problem is parsing...
Currently VerseKey provides all the nice parsing functionality thatfigures out:
Ijn2-3:12
It can do this because it has a set of books that it know about, alongwith all kinds of abbreviations and translated into a number oflanguages. Our current parser also drops suffixed letters.
Finally, if we solve these problems, and place an entry in LXXM:Category=Biblical Texts, it will probably break most frontends whichexpect all Biblical Texts to use a VerseKey. I don't know how tosolve this problem.
I also considered a major change to VerseKey which would make alllevels strings and not integers. I realize many frontends use integerspin controls to increase/decrease chapter and verse. There may alsobe linear logic regarding these things.
I guess the real question is, would it be easier for everyone to addparsing and display support to treekey and leave versekey alone? Thisis the direction I'm leaning right now. Any thoughts to sway me wouldbe appreciated.
    -Troy.

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] verse parsing

Reply via email to