Re: Applying SpellChecker to a phrase

2007-12-03 Thread Doron Cohen
See below - smokey <[EMAIL PROTECTED]> wrote on 03/12/2007 05:14:23: > Suppose I have an index containing the terms impostor, > imposter, fraud, and > fruad, then presumably regardless of whether I spell impostor and fraud > correctly, Lucene SpellChecker will offer the improperly > spelled versi

Re: can we do partial optimization?

2007-12-03 Thread Doron Cohen
It doesn't make sense to optimize() after every document add. Lucene in fact implements a logic in the spirit of what you describe below, when it decides to merge segments on the fly. There are various ways to tell Lucene how often to flush recently added/updated documents and what to merge. But

Re: SpellChecker performance and usage

2007-12-03 Thread Doron Cohen
I didn't have performance issues when using the spell checker. Can you describe what you tried and how long it took, so people can relate to that. AFAIK the spell checker in o.a.l.search.spell does not "expand a query by adding all the permutations of potentially misspelled word". It is based on b

Re: FieldCache Implementations

2007-12-03 Thread Thom Nelson
I have implemented a custom version of FieldCache to handle multi-valued fields, but this requires an interface change so it isn't applicable to what you're suggesting. However, it would be great to have a standard solution for handling multiple values. Grant Ingersoll wrote: Does any out the

Re: Applying SpellChecker to a phrase

2007-12-03 Thread smokey
I have not tried this yet. I am trying to understand the best practices from others who have experiences with SpellChecker before actually implementing it. If I understand it correctly, the spell check class suggests alternate but similar words for a single input term. So I believe I will have to

SpellChecker performance and usage

2007-12-03 Thread smokey
My question is for anyone who has experience with Lucene's SpellChecker, especially around its performance characteristics/ramifications. 1. Given the fact that SpellChecker expands a query by adding all the permutations of potentially misspelled word, how does it perform in general? 2. How are o

Re: BooleanQuery TooManyClauses in wildcard search

2007-12-03 Thread Erick Erickson
First time I tried this I made it WAY more complex than it is WARNING: this is from an older code base so you may have to tweak it. Might be 1.9 code public class WildcardTermFilter extends Filter { private static final long serialVersionUID = 1L; protected BitSet

Re: Applying SpellChecker to a phrase

2007-12-03 Thread Erick Erickson
Have you actually tried this and done a query.toString() to see how this is actually expanded? Not that I'm all that familiar with SpellChecker, but before presuming how things work you would get answers faster if you ran a test. And, why do you care about performance? I know that's a silly qu

FieldCache Implementations

2007-12-03 Thread Grant Ingersoll
Does any out there using Lucene implement their own version of FieldCache.java? We are proposing to make it an abstract class, which violates our general rule about back-compatibility (see https://issues.apache.org/jira/browse/LUCENE-1045) -Grant -- Grant Ingersoll h

Re: can we do partial optimization?

2007-12-03 Thread Michael McCandless
The current trunk of Lucene (unreleased 2.3-dev) has a new method on IndexWriter: optimize(int maxNumSegments). This method should do what you want: you tell it how many segments to optimize down to, and it will try to pick the least cost merges to get the index to that point. It's very new (onl

can we do partial optimization?

2007-12-03 Thread Nizamul
Hello, I am very new to Lucene.I am facing one problem. I have one very large index which is constantly getting update(add and delete) at a regular interval.after which I am optimizing the whole index (otherwise searches will be slow) but optimization takes time.So I was thinking to merge only