Re: factor in stopwords when searching

2008-03-22 Thread Chris Lu
Hi, Erik, I understand your rant. :) Well, the solution I finalized with is this, as suggested by Jake and Grant. For those stop words, when indexing content, I will treat them as normal words. When processing the user query, there will be normal query with stop words skipped, and another part tha

Re: factor in stopwords when searching

2008-03-22 Thread Erick Erickson
Well, whether it's a good user experience is exactly the question. I've spent far too much time satisfying customer (or product manager) requests that add zero value to the product *in the user's eyes*. And I quote: "This is asked by some customer, who may not know what's "stop words" at all." wh

Re: factor in stopwords when searching

2008-03-22 Thread Chris Lu
This is asked by some customer, who may not know what's "stop words" at all. Jake's approach should be quite similar to what some search engine companies are doing. It'll cost some storage, but can achieve a good user experience. The benefit is kind of obvious in real world. When users enter some

Re: factor in stopwords when searching

2008-03-22 Thread Erick Erickson
What's your reason for trying? The whole point of stop words is that they should be considered "no ops". That is, they add nothing to the semantics of whatever is being processed. I' don't understand the use case for why you want to go outside that assumption. Another way of asking this is "what t

Re: factor in stopwords when searching

2008-03-21 Thread Jake Mannix
I think the way I've seen it done most often is to either index some bi-grams which contain stop words (so "the database" and "search the" are in the index as individual tokens), or else to index that piece of content twice - once with stop words removed (and stemming, if you use it), and then agai

Re: factor in stopwords when searching

2008-03-21 Thread Grant Ingersoll
Don't throw away the stopwords? :-) Lucene can't score something it doesn't know exists. I suppose you could try to get fancy w/ payloads and add payloads if stopwords exist, but I am just thinking out loud there. On Mar 21, 2008, at 9:20 PM, Chris Lu wrote: Let's say "the" is consider