This actually sounds bugish to me, but you removed the text from your original email, so I don't know what context this was in.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: gaz77 <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Friday, August 29, 2008 12:50:46 AM > Subject: Re: Confused with NGRAM results > > > Thanks for the pointer. > > I've gone into this in some depth, using the AnalyzerUtils class from the > lucene in action book. > > It seems that the NGramTokenFilter is only processing part of the string > that goes in. It stops tokenising the words part way through. That's why the > documents weren't found in results. > > I've had a look at the source code, and I think it's because the next() > function returns null when it hits a token smaller than the min ngram size. > For example, if I set the minimum to 3, then a 2-character token will cause > it to return null. > > I'm not sure if this is by design or a bug. either way, at least I know > what's causing it now. > > Cheers > > > > -- > View this message in context: > http://www.nabble.com/Confused-with-NGRAM-results-tp19202310p19210665.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]