This will, indeed, NOT remove stop words. If that is all you need, you're
done.
But you will now have useless words in your index like the, is, etc. Making
your own analyzer by subclassing a suitable existing analyzer, or composing
one
will fix you right up if having the extra words in your index
Hmmm... I had come up with a temporary solution for the time being. This is
how I am initializing the StandardAnalyzer to fix my problem.
String[] STOP_WORDS = {};
this.analyzer = new StandardAnalyzer(STOP_WORDS);
This now indexes all my stop words, and gladly it didn't increase my
indexing time
ish Vadala [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 18, 2007 10:26 AM
To: java-user@lucene.apache.org
Subject: RE: Phrase Query Problem
ok, thnx... I will implement using the WhiteSpaceAnalyzer... Let me check
the
indexing speed... I mean time taken to index my data set... If that takes
t
-Original Message-
> From: mark harwood [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 18, 2007 9:42 AM
> To: java-user@lucene.apache.org
> Subject: Re: Phrase Query Problem
>
>
> You could write a custom analyzer that drops stopwords but adds an extra 1
> to the &
-
From: mark harwood [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 18, 2007 9:42 AM
To: java-user@lucene.apache.org
Subject: Re: Phrase Query Problem
You could write a custom analyzer that drops stopwords but adds an extra 1
to the "positionIncrement" property for the next v
ecause the remaining words are not
recorded as being directly next to each other)
Cheers
Mark
- Original Message
From: Sirish Vadala <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, 18 December, 2007 5:10:19 PM
Subject: RE: Phrase Query Problem
Yes... If my
Yes... If my query phrase is "Health Safety", docs with "Health and Safety",
"Health or Safety" are being returned...
So... Is there any other way to handle this situation... Especially in the
above mentioned case, the user is expecting around 5 records and the query
is fetching more than 550 rec
Hi Sirish,
A few hours ago I sent a reply to your message, if my
understanding is correct, you indexed a doc with text
as
Health and Safety
and you used phrase
Health Safety
to create a phrase query. If that is the case, this is
normal since you used StandardAnalyzer to tokenize the
input tex
Hi,
Do you mean that your query phrase is "Health Safety",
but docs with "Health and Safety" returned?
If that is the case, the reason is that StandardAnalyzer
filters out "and" (also "or, "in" and others) as stop
words during indexing, and the QueryParser filters those
words out also.
Best reg