Re: [sword-devel] verse parsing

Chris Little Wed, 29 Mar 2006 01:45:27 -0800

First, on the topic of OSIS book abbreviations:

Almost everything you should ever need for Bibles is athttp://www.crosswire.org/~chrislit/osis/BibleBookNames.html

There are also the following, less up-to-date xml files, which add morenon-canonical materials. These were the the source materials for theabove, but I haven't maintained them since creating the above list ofBible books.

Bible: http://www.crosswire.org/~chrislit/osis/bible.xml
OT Pseudepigrapha: http://www.crosswire.org/~chrislit/osis/otp.xml
NT Apocrypha: http://www.crosswire.org/~chrislit/osis/nta.xml
Nag Hammadi codices: http://www.crosswire.org/~chrislit/osis/naghammadi.xml
(named) Dead Sea Scrolls: http://www.crosswire.org/~chrislit/osis/qumran.xml
Mormon texts: http://www.crosswire.org/~chrislit/osis/lds.xml

Classical sources (but actually just Josephus, currently):http://www.crosswire.org/~chrislit/osis/classical.xml

Now, looking at the list of files at the LXXM source site(http://ccat.sas.upenn.edu/gopher/text/religion/biblical/lxxmorph/),there are four categories of problems with mapping files onto OSIS IDs:

1) Books with <number>.<abbrev>.<number>.mlxx style filenames, e.g.01.Gen.1.mlxx & 02.Gen.2.mlxx. These are just single books divided intotwo files and should be concatenated.

2) Apocryphal books. These should all be listed in the file listed atthe top. E.g. Judith = Jdt, Tobit = Tob, Odes = Odes, Psalms of Solomon= PssSol.

3) Ezras. The Ezras are just absurdly icky. For the LXX, I recommend NOTjust mapping 1Esdras to Ezra and 2Esdras to Nehemiah. The don't actuallyline up correctly like this. Whole volumes could and probably have beenwritten about the Ezras, and I would strongly recommend just taggingthem 1Esd and 2Esd, respectively.


[Specifically:
Hebrew Ezra = Vulgate 1Esd = KJV Ezra
Hebrew Neh = Vulgate 2Esd = KJV Neh

LXX 1Esd = Vulgate 3Esd = KJV 1Esd = 2Chr 35-36 paraphrased + Ezra + Neh7:38-8:12 + other material

LXX 2Esd = Hebrew Ezra+Neh = Vulgate 1Esd+2Esd = KJV Ezra+Neh

And 4Esd(=4Ezra+5Ezra+6Ezra) makes things even more complicated--butluckily isn't of import since it isn't in the LXX.]

4) Variant books, namely (Josh|Judges)(B|A), Tobit(BA|S),(Daniel|Bel|Sus)(OG|Th)--6 books with 2 variants each. I would stronglyrecommend treating each of these 12 books as individual books. Give themunique osisIDs, present them to the user as unique books, etc. This ishow Logos does it. This is how BibleWorks does it. And I believe STEPeven incorporated a separate book ID to account for the 6 additionalbooks in Rahlfs. Rahlfs is a sufficient important source text that youreally ought to do whatever you need to do to accommodate it in itsnative form. You should wedge it into another versification system (e.g.one with only one book each of Joshua, Judges, Tobit, Daniel, Bel & theDragon, and Susanna).

I don't have my Rahlfs with me, but I really don't think presenting itin a tabular view with both traditions on a single screen is the rightway to go. If we're working within the KJV versification, that's asuitable compromise. But if we're permitted to make changes to theunderlying versification system in Sword and present Rahlfs in its OWNversification system, the books should be separated.

Towards that end, I would recommend adding 6 books to theBibleBookNames.html file cited at the top, to accomodate the 6 variantbooks in Rahlfs: JoshA, JudgA, TobS, DanTh, BelTh, & SusTh. Under thissystem, JoshB = osisID Josh, JudgesA = osisID Judg, TobBA = osisID Tob,and the OG Daniel texts = osisIDs Dan, Bel, and Sus. Does that seemagreeable?

The only other way to deal with them is to call them part of a separatework and use the standard book IDs for both, but put the variants in thesecond work. I don't like that idea since they're part of the same printvolume, a volume which is generally considered a single work.


A few more comments below...

Troy A. Griffitts wrote:

Obviously, my goal was to save everyone as much modification aspossible, but there just doesn't seem like there is a good fit formodules like these.

I think DM, Martin, and I agree on this point: make it work correctly,regardless of how badly it breaks existing frontends. We can makemodules requiring a new driver invisible to existing frontends andfuture frontends can support new features when they are ready to do so.

The next thing I began to realize is that this module uses a,b,c typesuffixes on verses (click on the first link in this email again andscroll to the bottom of the page). This does not fit nicely into ourinteger concept for verses. I considered adding a 5th level:Testament/Book/Chapter/Verse/Sub. But this really breaks the wholeparadigm anyway, as sub will mostly be blank except when there might bea letter tacked to the end. It really doesn't solve any problems, e.g.key.Verse(key.Verse()+1) still will break. key++ would work, I guess,but you'd have to always check if Sub was set to anything. And whoknows what Sub really means. Is it a replacement? Is it really asubdivision of the verse? It just doesn't seem like it solves anyproblems nicely. It seems like the LXX really is sequentially 31, 31a,32, 33, 33a, 33b. When I know that other Bibles and commentaries meanthe first part of 33 when they say 33a. So adding Sub doesn't seem likeit gives us much except keeping Verse an integer.

We need to deal with non-integers for chapters in Greek Esther as in theNRSV also. In addition, those chapters aren't in sequential numerical oralpha-numeric order. So we'll have to deal with out-of-order chaptersand, probably, verses. GenBooks handle that fine. Translation toVerseKeys is going to be a challenge.

The 'reference' is display like:

/JoshB/24/1
We could add a flag which says to display using a BK CH:VS format. Iwas thinking about adding a pattern, like letting the modules.conf filespecify something like:
KeyDisplay=%1 %2:%3
but I think this is more work for everyone than it benefits. Besides,other languages probably prefer other formats (BK CH.VS). So I thinkwe'd like to just say something like KeyFormat=BCV

That looks like a great idea. Other LANGUAGES shouldn't be allowed tomodify the formatting of a text. On the other hand, giving other TEXTSthe ability to have customized presentation would be a great benefit,and this accommodates that very well. For example, the print NRSV OxfordStudy Bible that I have uses BK CH.VS.

The other problem is parsing...
Currently VerseKey provides all the nice parsing functionality thatfigures out:
Ijn2-3:12
It can do this because it has a set of books that it know about, alongwith all kinds of abbreviations and translated into a number oflanguages. Our current parser also drops suffixed letters.

I think part of the solution is to make the parser more generalized andto force the module to give it some parameters for parsing. Each moduleneeds to tell the parser something like 1) the format and 2) validbooks. The format might be something like a PERL regular expression:"($book) ([0-9]+):([0-9]+)([a-c])", where the parser then picks out thebook, chapter, verse, and sub-verse. I have no recommendations forimplementation and don't even know whether it is feasible.

The list of valid books is simpler. Every modules should simply providean ordered list of its contents (in osisID form, naturally). The parserthen constructs a list of possible book abbreviations to use in parsing,excluding those books not present. For example, the LXXM is going toinclude Judith, but not Jude. So the parser would include all theabbreviations for Judith, but not those for Jude, and a reference to "Ju1:1" should parse as Jdt.1.1.

Finally, if we solve these problems, and place an entry in LXXM:Category=Biblical Texts, it will probably break most frontends whichexpect all Biblical Texts to use a VerseKey. I don't know how to solvethis problem.


I would just give it a different Category.

I also considered a major change to VerseKey which would make all levelsstrings and not integers. I realize many frontends use integer spincontrols to increase/decrease chapter and verse. There may also belinear logic regarding these things.

Unfortunately I can't think of a better solution to handling the arrayof versification systems that exist. I think that's why we went withstrings in OSIS.


--Chris

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] verse parsing

Reply via email to