On Aug 30, 2009, at 4:07 PM, Matthew Talbert <ransom1...@gmail.com> wrote:

I had submitted a patch that did this and it was rejected because it did not preserve backward compatibility without providing a versioning system for
each generated index.

If by backward compatibility, you mean that old indexes will still
work as they always have, then backwards compatibility is being
preserved (this is how I would interpret it).

This is what I meant. The analyzer is used to tokenize both the text going into the index and the search request. If both are not tokenized the same there will be mismatches.

Some examples: old index w/o stopwords and engine that preserves them. In the following example IN is a stopword.

Search a phrase w/ a stop word. "in Christ" will look for all soca containing both "in" and "Christ" with the first immediatily preceding the second.

Search for the same but not as a phrase. The default action is to find all verses that contain either word. This will find all verses with Christ and none with In. This is the same as searching for IN OR CHRIST

If the default is overridden to mean AND or the search is IN AND CHRIST then no verses will be found.


But new indexes will
obviously be different than the old ones. If this is what you mean,
then we really can't change anything in the indexing until some
versioning scheme is implemented, correct? The recent Hebrew changes
broke both of these principles: old indexes are unusable (will return
0 results for modules that have Hebrew vowels), and new indexes are
different than the old ones.

IMHO, bugs need to be fixed but in a way that does not compromise good indexes. Changing the limit is one of those changes. It does not harm indexes that never hit the limit. The tough part is disttingishing between the two and helping the user fix the problem.

The changes to the size of the fields
allowed will do the same thing, although old indexes will still be
usable (if you call returning 30% of the actual hits usable). I agree
with the need for versioning (I mentioned it first in this thread :)
), but to not fix bugs because of that seems silly.

Agreed. Just need to be careful to preserve BC in so far as possible. (BTW, you were first in this thread to mention versioning but there were earlier threads to discuss it. :)


As to using a simple incrementing number to represent the version of the index, this may not be adequate. It is sufficient if the user has no control over the index and indexes that do not match the version number of the engine are ignored/discarded/automatically upgraded... by the front- end or
engine.

I believe we should follow the principle of "do the simplest thing
that will possibly work". All we need at the moment is a simple
version number. Everything without version numbers will be presumed to
be older. In my opinion, if the version number is older than the
(index) version of the library, then the library should just return
false when asked if the module has fast search framework (I forget the
function name). Then the front-end can do whatever it needs in that
situation. This also has the advantage of not needing a new API.

I suggest to plan for the future and implement for the present. A simple number is not sufficient for the future. A versioned list of features would be. An ini file w/ a list of features would work well e.g.
[index]
lucene=1.4.3
StandardAnalyzer=2
Notes=1
Headings=1
...




Give the user any control over the index or provide the front-end any
indication of what is in the index and it is not sufficient. Further, once we get to analyzers per language each feature needs a version number as
well.

Very messy.

Yes, but we're not there today. Considering that currently none of the
non-English analyzers are ported to C++, to not do something now, or
to design a complicated system based on functionality that may never
arrive, seems backwards.

The solution we have for BibleDesktop/JSword is to just let the user know that if search does not perform as expected to delete the index and rebuild
it. Not at all a good solution, but we've not had any complaints.

The best solution is not always the most technically correct solution.
As above, many times it's the simplest solution that is best.

Matthew

That's why JSword hasn't tackled it yet (we have the beginnings of an implementation) and why I submitted a patch to SWORD that didn't have versioning.

But it was rejected. Maybe this time is different.

In Christ,
DM
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to