Re: Confused with NGRAM results

gaz77 Thu, 28 Aug 2008 15:54:27 -0700

Thanks for the pointer.

I've gone into this in some depth, using the AnalyzerUtils class from the
lucene in action book.


It seems that the NGramTokenFilter is only processing part of the string
that goes in. It stops tokenising the words part way through. That's why the
documents weren't found in results.

I've had a look at the source code, and I think it's because the next()
function returns null when it hits a token smaller than the min ngram size.
For example, if I set the minimum to 3, then a 2-character token will cause
it to return null.

I'm not sure if this is by design or a bug. either way, at least I know
what's causing it now.

Cheers



-- 
View this message in context: 
http://www.nabble.com/Confused-with-NGRAM-results-tp19202310p19210665.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Confused with NGRAM results

Reply via email to