Re: Phrase Query Problem

2007-12-18 Thread Erick Erickson
This will, indeed, NOT remove stop words. If that is all you need, you're done. But you will now have useless words in your index like the, is, etc. Making your own analyzer by subclassing a suitable existing analyzer, or composing one will fix you right up if having the extra words in your index

RE: Phrase Query Problem

2007-12-18 Thread Sirish Vadala
Hmmm... I had come up with a temporary solution for the time being. This is how I am initializing the StandardAnalyzer to fix my problem. String[] STOP_WORDS = {}; this.analyzer = new StandardAnalyzer(STOP_WORDS); This now indexes all my stop words, and gladly it didn't increase my indexing time

RE: Phrase Query Problem

2007-12-18 Thread Zhang, Lisheng
ish Vadala [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 18, 2007 10:26 AM To: java-user@lucene.apache.org Subject: RE: Phrase Query Problem ok, thnx... I will implement using the WhiteSpaceAnalyzer... Let me check the indexing speed... I mean time taken to index my data set... If that takes t

RE: Phrase Query Problem

2007-12-18 Thread Sirish Vadala
-Original Message- > From: mark harwood [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 18, 2007 9:42 AM > To: java-user@lucene.apache.org > Subject: Re: Phrase Query Problem > > > You could write a custom analyzer that drops stopwords but adds an extra 1 > to the &

RE: Phrase Query Problem

2007-12-18 Thread Zhang, Lisheng
- From: mark harwood [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 18, 2007 9:42 AM To: java-user@lucene.apache.org Subject: Re: Phrase Query Problem You could write a custom analyzer that drops stopwords but adds an extra 1 to the "positionIncrement" property for the next v

Re: Phrase Query Problem

2007-12-18 Thread mark harwood
ecause the remaining words are not recorded as being directly next to each other) Cheers Mark - Original Message From: Sirish Vadala <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, 18 December, 2007 5:10:19 PM Subject: RE: Phrase Query Problem Yes... If my

RE: Phrase Query Problem

2007-12-18 Thread Sirish Vadala
Yes... If my query phrase is "Health Safety", docs with "Health and Safety", "Health or Safety" are being returned... So... Is there any other way to handle this situation... Especially in the above mentioned case, the user is expecting around 5 records and the query is fetching more than 550 rec

RE: Phrase Query Problem

2007-12-17 Thread Zhang, Lisheng
Hi Sirish, A few hours ago I sent a reply to your message, if my understanding is correct, you indexed a doc with text as Health and Safety and you used phrase Health Safety to create a phrase query. If that is the case, this is normal since you used StandardAnalyzer to tokenize the input tex

RE: Phrase Query Problem

2007-12-17 Thread Zhang, Lisheng
Hi, Do you mean that your query phrase is "Health Safety", but docs with "Health and Safety" returned? If that is the case, the reason is that StandardAnalyzer filters out "and" (also "or, "in" and others) as stop words during indexing, and the QueryParser filters those words out also. Best reg