This will, indeed, NOT remove stop words. If that is all you need, you're done.
But you will now have useless words in your index like the, is, etc. Making your own analyzer by subclassing a suitable existing analyzer, or composing one will fix you right up if having the extra words in your index turns out not to be OK. And it shouldn't change your indexing speed noticeably. Best Erick On Dec 18, 2007 2:44 PM, Sirish Vadala <[EMAIL PROTECTED]> wrote: > > Hmmm... I had come up with a temporary solution for the time being. This > is > how I am initializing the StandardAnalyzer to fix my problem. > > String[] STOP_WORDS = {}; > this.analyzer = new StandardAnalyzer(STOP_WORDS); > > This now indexes all my stop words, and gladly it didn't increase my > indexing time remarkably, but only a small difference. Not sure if this is > the right solution. Will also do some research on custom analyzers. > > > Hi, > > 1) Whenever we change to a different analyzer, we need to reindex > whole dataset, whether or not using WhiteSpaceAnalyzer. > 2) Using WhiteSpaceAnalyzer may increase disk space and slow-down > indexing because more tokens are indexed, how much can be slowed > I donot know. > 3) WhiteSpaceAnalyzer also keeps case, for example, if input text > has "Health", query "health" may not return the doc, make sure > if this is you need, also this analyzer will keep all symbols, > like coma, period .... For example, if text has "Number ONE issue > is health safety!", query "health safety" will not return the doc, > because "safety!" is indexed as a token, not "safety". > > I felt most important thing is to make sure the exact query requirement, > then picking up analyzer. > > Best regards, Lisheng > > -- > View this message in context: > http://www.nabble.com/Phrase-Query-Problem-tp14373945p14404143.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >