Yeah, so, a few comments on this thread (my apologies for be preoccupied over the past month or so):

SWORD has the concept of "filtering" a module's text at different points in processing, for different purposes. One of these filter-points is for searching and we call these filters "Strip Filters".

Strip Filters are typically named something like OSISPlain or GBFPlain, etc. These typically take all the markup out of an entry and prepare the text to be searched, but anything can be done to the text to prepare it for searching. We typically remove accents and vowel points from Greek and Hebrew, respectively. If diacritics need to be removed from Arabic, then we can certainly add a filter for this as well. I believe Peter may have already done this and has referred to a patch submitted in November, last year. Peter, please remind me if I have neglected to commit something for you.

Any Strip Filter can be added to a module by a module author with a line in the .conf file, such as:

LocalStripFilter=UTF8ArabicPoints

A list of filters can be found by browsing the source folder here:

http://crosswire.org/svn/sword/trunk/src/modules/filters/

They're pretty concise and don't involve much knowledge from the rest of the engine, making them easy to write if we need a new one.

This processing can replace or be complimentary to any processing done by clucene.

Since we need to strip markup, and other things clucene will likely never support (see PapyriPlain-- annotations like [,],?{,}, underdot) we need this pre-process mechanism to prepare the text before searching. We also maintain searching functionality apart from "fast indexed searching" (currently supplied by clucene, but could be supplied by any other fast search framework we decide we might want to integrate).

Hope this informs this thread a little,

Troy




On 11/27/2012 01:05 AM, Peter von Kaehne wrote:
Guys, are you sure this is a problem with Clucene and not just with the strip 
filter?

Has anyone tried out the patch? It was sent in November last year IIRC

Peter
-------- Original-Nachricht --------
Datum: Mon, 26 Nov 2012 23:19:20 -0600
Von: Greg Hellings <greg.helli...@gmail.com>
An: "SWORD Developers\' Collaboration Forum" <sword-devel@crosswire.org>
Betreff: Re: [sword-devel] Search bug & New Arabic Bible,   Not Shaped SVD 
Version
On Mon, Nov 26, 2012 at 11:15 PM, Nic Carter <niccar...@mac.com> wrote:
My understanding is that we are currently locked into a really old
version of the C library

False.

& it is no longer being maintained.
True

Instead we need to port SWORD to use the current version of the library,
Already done.

which is actively being maintained...
It isn't. That's the complaint. :)

--Greg

I gather some work has been done on this but I'm not sure where it's
currently up to. It's on my todo list, along with about a million other things
that have piled up over the last year... :)
Sent from my phone, hence this email may be short...

On 27/11/2012, at 15:17, pola ashraf <5...@hotmail.com> wrote:

we depend on a library that get updates very frequently in java but no
updates for its C port
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to