Re: [sword-devel] Script to find a best fit v11n

DM Smith Thu, 19 Jun 2025 07:00:35 -0700

Greg,

I like that it's very simple to read. Having a summary is good. And the other 
email which lists the exact ids extra/missing per testament is very helpful.


I think that enumerating the names of the extra/missing books and extra/missing 
chapters would be good. No sense in enumerating the ids within these.

I ran mine against an input that was a test case for osis2mod’s infinite loop 
and it had 2 extra books and 13 extra chapters. This wouldn’t be obvious in 
your results.

Is it an advantage or disadvantage to be compiled against SWORD lib vs slurping 
header files?

— DM

> On Jun 19, 2025, at 12:00 AM, Greg Hellings <greg.helli...@gmail.com> wrote:
> 
> Here is an example of the first lines of running my script against the 
> kjv.osis.xml file from the git repo:
> 
> 
> Checking Calvin:
> ----------------
>         There are 93 OT IDs and 5 NT IDs in v11n which aren’t in your file.
>         There are 0 OT IDs and 30 NT IDs in your file which don’t appear in 
> v11n.
> 
> Checking Catholic:
> ------------------
>         There are 4530 OT IDs and 3 NT IDs in v11n which aren’t in your file.
>         There are 0 OT IDs and 133 NT IDs in your file which don’t appear in 
> v11n.
> 
> Checking Catholic2:
> -------------------
>         There are 4638 OT IDs and 3 NT IDs in v11n which aren’t in your file.
>         There are 0 OT IDs and 133 NT IDs in your file which don’t appear in 
> v11n.
> 
> Checking DarbyFr:
> -----------------
>         There are 31 OT IDs and 4 NT IDs in v11n which aren’t in your file.
>         There are 0 OT IDs and 30 NT IDs in your file which don’t appear in 
> v11n.
> 
> This continues on to include such output as
> 
>                                                                               
>                                                  
> Checking KJV:
> ------------- 
>         Your file has all the references in this v11n
>         Your file has no extra references                                     
>                                                  
>                                                                               
>                                                  
> Checking KJVA:         
> --------------
>         There are 5717 OT IDs and 0 NT IDs in v11n which aren’t in your file.
>         Your file has no extra references
> 
> giving a clear example of a winner for this particular file.
> 
> Meanwhile, running it against the kjva.osis.xml file includes this in the 
> results:
> 
> ...
> 
> Checking KJV:        
> -------------        
>         Your file has all the references in this v11n
>         There are 2 OT IDs and 5715 NT IDs in your file which don’t appear in 
> v11n.
>                                                                
> Checking KJVA:                                                                
>                                                  
> --------------                                                                
>                                                  
>         Your file has all the references in this v11n
>         Your file has no extra references
> ...
> 
> Fiddling with the file has showed me there are a couple of places where I 
> need to tweak it for Python 3 compatibility that I missed the last time I 
> updated. But fixing those couple of little syntax issues resulted in it 
> running just fine in a Fedora 41 vm with nothing more to do than invoke `dnf 
> install python3-sword` to setup the system to use it.
> 
> --Greg
> 
> On Wed, Jun 18, 2025 at 10:40 PM Greg Hellings <greg.helli...@gmail.com 
> <mailto:greg.helli...@gmail.com>> wrote:
>> My script eschews percentages because they seemed relatively pointless to me 
>> for measuring a mismatch like this. Instead it gives a count of both Old and 
>> New Testament osisIDs that it finds missing and another that it finds 
>> unexpectedly for a given versification. If the total of either count is 
>> fewer than 100, the IDs for that particular count are printed to the 
>> console. It will do this for every registered versification in the version 
>> of the library it was compiled against, allowing the user to select 
>> whichever one seems best to them based on the results.
>> 
>> On Wed, Jun 18, 2025, 10:25 PM David Haslam <dfh...@protonmail.com 
>> <mailto:dfh...@protonmail.com>> wrote:
>>> It’s not just the number of “missing” verses that should figure in the 
>>> percentage score, but also the number of verses that get concatenated to 
>>> the last one in a chapter.
>>> 
>>> The differences in v11n for the Psalms will be especially significant for 
>>> this, in that some v11n renumber many of them. Likewise for the last few 
>>> chapters in the book of Job.
>>> 
>>> Aside: It would be cool to enhance the utility emptyvss by providing a 
>>> command line option that would ignore books that are not included in the 
>>> scope parameter in the conf file.
>>> 
>>> Regards,
>>> 
>>> David
>>> 
>>> On Thu, Jun 19, 2025 at 03:18, DM Smith <dmsm...@crosswire.org 
>>> <mailto:On+Thu,+Jun+19,+2025+at+03:18,+DM+Smith+%3C%3Ca+href=>> wrote:
>>>> 
>>>> David,
>>>> 
>>>> Because it only considers the xml, scope is automatically built into it. 
>>>> It is only comparing what is present in the xml with what is part of the 
>>>> av11ns. 
>>>> 
>>>> It might be good to add the enumeration of missing verses.
>>>> 
>>>> — DM
>>>> 
>>>>> On Jun 18, 2025, at 4:02 PM, David Haslam <dfh...@protonmail.com 
>>>>> <mailto:dfh...@protonmail.com>> wrote:
>>>>> 
>>>>> Does it take account of the Scope key in the .conf file for a less than 
>>>>> complete Bible ?
>>>>> 
>>>>> David
>>>>> 
>>>>> Sent from Proton Mail <https://proton.me/mail/home> for iOS
>>>>> 
>>>>> 
>>>>> On Wed, Jun 18, 2025 at 20:51, DM Smith < dmsm...@crosswire.org 
>>>>> <mailto:On+Wed,+Jun+18,+2025+at+20:51,+DM+Smith+%3C%3Ca+href=>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Several have commented on how hard it is to test an OSIS xml file 
>>>>>> against v11ns especially since it goes off into an infinite loop. (I’ve 
>>>>>> posted a patch that fixes that) But it is still a process of trial and 
>>>>>> error to find an appropriate v11n.
>>>>>> 
>>>>>> So, I’ve been iterating with chatGPT to create a python script to find a 
>>>>>> best fit v11n. Since I don’t know python, I can’t vouch for the script 
>>>>>> beyond it worked for a simple test case that had an extra chapter for 
>>>>>> Genesis and had some extra verses at the end of a chapter in that book.
>>>>>> 
>>>>>> I offer it, as a starting place. See the attached file.
>>>>>> 
>>>>>> It has a —debug flag.
>>>>>> The first argument is expected to be the OSIS xml file.
>>>>>> The second argument is optional and gives the location to the include 
>>>>>> directory of svn/sword/trunk/include with all the canon*.h files. If you 
>>>>>> don’t supply the argument, it uses the web to load the canon*.h files 
>>>>>> from https://www.crosswire.org/svn/sword/trunk/include. 
>>>>>> 
>>>>>> It will score the fitness of each of the v11ns. It gives the score as a 
>>>>>> %, but I don’t know what that means. I told it that it should prioritize 
>>>>>> book matches, then chapter matches and finally verse matches. I don’t 
>>>>>> know how well it did that scoring. I didn’t test for that.
>>>>>> 
>>>>>> The output is alphabetized. If more than one v11n have the same high 
>>>>>> score, they are listed.
>>>>>> 
>>>>>> In His Service,
>>>>>>  DM
>>>>>> 
>>>>> _______________________________________________ 
>>>>> sword-devel mailing list: sword-devel@crosswire.org 
>>>>> <mailto:sword-devel@crosswire.org> 
>>>>> http://crosswire.org/mailman/listinfo/sword-devel 
>>>>> Instructions to unsubscribe/change your settings at above page
>>>> 
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel@crosswire.org 
>>> <mailto:sword-devel@crosswire.org>
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Script to find a best fit v11n

Reply via email to