Here are some indexing parameters. 
https://getting-started-with-xapian.readthedocs.io/en/latest/concepts/indexing/limitations.html#index-limitations

Not all will be relevant but default word size of 245 will be unnecessary for most languages. 

Peter

Sent from Outlook for iOS

From: sword-devel <sword-devel-boun...@crosswire.org> on behalf of Peter von Kaehne <ref...@gmx.net>
Sent: Friday, June 6, 2025 7:43 am
To: SWORD Developers' Collaboration Forum <sword-devel@crosswire.org>
Subject: Re: [sword-devel] RIP CLucene on Mac Silicon
 
Xapian is of course used by Gnome extensively and while initial indexing of a full home directory or full mailbox - each many multiples of a sword module or even a relatively sizeable library - can take its time I never had a concerns with index size in daily use with Gnome indexing. 

So could it be that the problem is not Xspian per se but the parameters we give it and the way we use for indexing? 

https://getting-started-with-xapian.readthedocs.io/en/latest/advanced/scalability.html

This suggests that there is a lot of possible ways of tweaking. FWIW our module indices are individual indices rather than library wide. We do not need any update facility for a search for most modules, just redo from scratch when we get a new updated module. So our trees could /should get optimised at least for that - compact size and fast reading, no writing necessary. 

Would this and any other material further down help  (I have not looked too hard as I do not yet know the search related code) ? 

Sent from Outlook for iOS

From: sword-devel <sword-devel-boun...@crosswire.org> on behalf of Greg Hellings <greg.helli...@gmail.com>
Sent: Friday, June 6, 2025 5:55 am
To: SWORD Developers' Collaboration Forum <sword-devel@crosswire.org>
Subject: Re: [sword-devel] RIP CLucene on Mac Silicon
 


On Thu, Jun 5, 2025 at 1:56 PM Karl Kleinpaste <k...@kleinpaste.org> wrote:
On 6/5/25 1:07 PM, Greg Hellings wrote:
Sword has support for Xapian, I believe, which is a much more recent and up to date library

Way back in November 2014, when Xapian's presence in Sword was new, I experimented with it. The problem I found is that its generated indices are absolutely humongous. At the time, I wrote to the list here to say that they were a 7x size increase, and that what was once a couple Gbytes had ballooned to 23.2Gbytes when I went through a round of mkfastmod for all my installed modules.

Running with just the KJV module just now, I have:

CLucene indexes the KJV in 12.5 seconds with a 12MB lucene directory
Xapian indexes KJV in 31 seconds with a 185MB xapian directory

It looks like it hasn't really gotten any better since your tests, Karl.
 

I would reference this from sword-devel archives, but www.crosswire.org is failing to respond right now.

Apropos of none of the above, in order for mkfastmod to be able to make a Xapian index, I had to apply the attached patch to the released Sword 1.9.0 as it was not updated when Xapian was first present as a target. Without it, mkfastmod doesn't know that it can run and gives the error that search frameworks are not supported.

I am also unable to pull up any of the crosswire.org site, so I don't know if the patch is applied to trunk, but I would venture to guess not. Xapian builds of Sword don't seem to be very popular so long as CLucene still exists on Linux.

--Greg
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
 
 
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to