On Sat, Aug 29, 2009 at 10:26 PM, DM Smith<dmsm...@crosswire.org> wrote: > FYI: Issue 2, removing stopwords, will break backward compatibility with > existing indexes. The existing indexes will not contain the stopwords. New > indexes will. This can be very confusing to users.
Two things: if we don't include them in the index, but only prevent searching for them, then it wouldn't break compatibility. Secondly, no one was able to search for them before without a segfault, so no one will know anything except it doesn't crash anymore. (The issue of stop words will be extremely less noticeable to users than the proposed size changes. In some cases, 30% or less of certain text segments were getting indexed, so this will make a huge difference in the number of hits). > If backward compatibility is ok to be broken, I suggest changing from > StandardAnalyzer to SimpleAnalyzer. It does not have stopwords to begin with > and will index the text without the silly transformations that the > StandardAnalyzer does. Just out of curiosity, what are the silly transformations? > The segfault is surprising to me. I suggest checking with the clucene folks > to see why it is happening. I really doubt it is a bug in clucene but > SWORD's use of it. I think perhaps we're supposed to strip out the stop words before querying clucene. It's easier just to set the stop words to NULL in the first place. It should be noted, that (afaik), the stop words are only English for clucene (lucene has analyzers for other languages that have different stop words). Notice that this issue affects crosswire.org/study as well. > Adding additional fields probably should be accompanied by adding versioning > the index. What the Java Lucene folks are doing for version 3.0 is to store > with the index a manifest of sorts that describes what was used to build the > index. I agree with versioning the index. I would increment it every time something changed that would affect the indexing, like the change for Hebrew, or the proposed field size change. Both of these changes break backwards-compatibility. Matthew _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page