Re: [sword-devel] indexed search discrepancy

DM Smith Sun, 30 Aug 2009 13:59:56 -0700

On Aug 30, 2009, at 4:07 PM, Matthew Talbert <ransom1...@gmail.com>wrote:

I had submitted a patch that did this and it was rejected becauseit did notpreserve backward compatibility without providing a versioningsystem for
each generated index.
If by backward compatibility, you mean that old indexes will still
work as they always have, then backwards compatibility is being
preserved (this is how I would interpret it).

This is what I meant. The analyzer is used to tokenize both the textgoing into the index and the search request. If both are not tokenizedthe same there will be mismatches.

Some examples: old index w/o stopwords and engine that preserves them.In the following example IN is a stopword.

Search a phrase w/ a stop word. "in Christ" will look for all socacontaining both "in" and "Christ" with the first immediatily precedingthe second.

Search for the same but not as a phrase. The default action is to findall verses that contain either word. This will find all verses withChrist and none with In. This is the same as searching for IN OR CHRIST

If the default is overridden to mean AND or the search is IN ANDCHRIST then no verses will be found.

But new indexes will
obviously be different than the old ones. If this is what you mean,
then we really can't change anything in the indexing until some
versioning scheme is implemented, correct? The recent Hebrew changes
broke both of these principles: old indexes are unusable (will return
0 results for modules that have Hebrew vowels), and new indexes are
different than the old ones.

IMHO, bugs need to be fixed but in a way that does not compromise goodindexes. Changing the limit is one of those changes. It does not harmindexes that never hit the limit. The tough part is disttingishingbetween the two and helping the user fix the problem.

The changes to the size of the fields
allowed will do the same thing, although old indexes will still be
usable (if you call returning 30% of the actual hits usable). I agree
with the need for versioning (I mentioned it first in this thread :)
), but to not fix bugs because of that seems silly.

Agreed. Just need to be careful to preserve BC in so far as possible.(BTW, you were first in this thread to mention versioning but therewere earlier threads to discuss it. :)

As to using a simple incrementing number to represent the versionof theindex, this may not be adequate. It is sufficient if the user hasno controlover the index and indexes that do not match the version number oftheengine are ignored/discarded/automatically upgraded... by the front-end or
engine.
I believe we should follow the principle of "do the simplest thing
that will possibly work". All we need at the moment is a simple
version number. Everything without version numbers will be presumed to
be older. In my opinion, if the version number is older than the
(index) version of the library, then the library should just return
false when asked if the module has fast search framework (I forget the
function name). Then the front-end can do whatever it needs in that
situation. This also has the advantage of not needing a new API.

I suggest to plan for the future and implement for the present. Asimple number is not sufficient for the future. A versioned list offeatures would be. An ini file w/ a list of features would work welle.g.

[index]
lucene=1.4.3
StandardAnalyzer=2
Notes=1
Headings=1
...

Give the user any control over the index or provide the front-end any
indication of what is in the index and it is not sufficient.Further, oncewe get to analyzers per language each feature needs a versionnumber as
well.

Very messy.
Yes, but we're not there today. Considering that currently none of the
non-English analyzers are ported to C++, to not do something now, or
to design a complicated system based on functionality that may never
arrive, seems backwards.
The solution we have for BibleDesktop/JSword is to just let theuser knowthat if search does not perform as expected to delete the index andrebuild
it. Not at all a good solution, but we've not had any complaints.
The best solution is not always the most technically correct solution.
As above, many times it's the simplest solution that is best.

Matthew

That's why JSword hasn't tackled it yet (we have the beginnings of animplementation) and why I submitted a patch to SWORD that didn't haveversioning.


But it was rejected. Maybe this time is different.

In Christ,

DM

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] indexed search discrepancy

Reply via email to