Yes. I thought the question for the tool was odd as I thought I had used such for a long time, but could not recall detail until Greg reminded me.
The problem is though not as simple as lowest number of oddities, but also where abd ehat they are. So some looking and comparing by eye ball is likely always needed. AFAIK Most NT discrepancies come from verses which got omitted in newer Greek Texts but OT discrepancies often result in wholesale shifts . Unless you want to involve AI translation comparison this is likely always beyond a script. But the script nicely narrows the choices down to a manageable number as long as you know what to look for in final decision making.
Peters
Peter
From: sword-devel <sword-devel-boun...@crosswire.org> on behalf of DM Smith <dmsm...@crosswire.org>
Sent: Thursday, June 19, 2025 5:17 pm
To: SWORD Developers' Collaboration Forum <sword-devel@crosswire.org>
Subject: Re: [sword-devel] Script to find a best fit v11n
What motivated me to “write” such a script was a couple of recent emails where people were having a hard time determining which v11n to use and were using osis2mod by trial and error to figure it out.
No one had suggested Greg’s excellent tool. So I assumed there was an unmet need.
On Jun 19, 2025, at 1:22 AM, Peter von Kaehne <ref...@gmx.net> wrote:
That script is what I have used for years witg good effect. Thank you Greg.
From: sword-devel <sword-devel-boun...@crosswire.org> on behalf of Greg Hellings <greg.helli...@gmail.com>
Sent: Thursday, June 19, 2025 6:41 am
To: SWORD Developers' Collaboration Forum <sword-devel@crosswire.org>
Subject: Re: [sword-devel] Script to find a best fit v11n
My script eschews percentages because they seemed relatively pointless to me for measuring a mismatch like this. Instead it gives a count of both Old and New Testament osisIDs that it finds missing and another that it finds unexpectedly for a given versification. If the total of either count is fewer than 100, the IDs for that particular count are printed to the console. It will do this for every registered versification in the version of the library it was compiled against, allowing the user to select whichever one seems best to them based on the results.
It’s not just the number of “missing” verses that should figure in the percentage score, but also the number of verses that get concatenated to the last one in a chapter.
The differences in v11n for the Psalms will be especially significant for this, in that some v11n renumber many of them. Likewise for the last few chapters in the book of Job.
Aside: It would be cool to enhance the utility emptyvss by providing a command line option that would ignore books that are not included in the scope parameter in the conf file.
Regards,
David
David,
Because it only considers the xml, scope is automatically built into it. It is only comparing what is present in the xml with what is part of the av11ns.
It might be good to add the enumeration of missing verses.
— DM
Does it take account of the Scope key in the .conf file for a less than complete Bible ?
David
Hi,
Several have commented on how hard it is to test an OSIS xml file against v11ns especially since it goes off into an infinite loop. (I’ve posted a patch that fixes that) But it is still a process of trial and error to find an appropriate v11n.
So, I’ve been iterating with chatGPT to create a python script to find a best fit v11n. Since I don’t know python, I can’t vouch for the script beyond it worked for a simple test case that had an extra chapter for Genesis and had some extra verses at the end of a chapter in that book.
I offer it, as a starting place. See the attached file.
It has a —debug flag.
The first argument is expected to be the OSIS xml file.
The second argument is optional and gives the location to the include directory of svn/sword/trunk/include with all the canon*.h files. If you don’t supply the argument, it uses the web to load the canon*.h files from
https://www.crosswire.org/svn/sword/trunk/include.
It will score the fitness of each of the v11ns. It gives the score as a %, but I don’t know what that means. I told it that it should prioritize book matches, then chapter matches and finally verse matches. I don’t know how well it did that scoring. I didn’t test for that.
The output is alphabetized. If more than one v11n have the same high score, they are listed.
In His Service,
DM
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page