Hi gaz77, Here's a good place to start:
<http://wiki.apache.org/jakarta-lucene/AnalysisParalysis> Steve On 08/28/2008 at 10:52 AM, gaz77 wrote: > > Hi, > > I'd appreciate if someone could explain the results I'm getting. > > I've written a simple custom analyzer that applies the > NGramTokenFilter to the token stream during indexing. It's never > applied during searching. The purpose of this is to match sub-words. > > Without the ngram filter, if I searched on the word 'postcode' it > returns 2 documents. If I searched on 'code' it returns 6 documents > (with no overlap on the postcode results). > > If I apply the ngram filter with min 1 and max 10, searching 'postcode' > returns the same 2 docs, while searching 'code' returns 9 docs. This > sort-of feels right. > > The problem comes when I set the min ngram size to 3, and the > max to 5. Searching 'postcode' returns no results (as expected), but > searching 'code' only returns 2 docs (the 2 normally returned by a > 'postcode' search). > > This last result for 'code' just doesn't seem correct - it should be > returning at least the 6 docs from the original search. > > I'd really appreciate some advice on what is going on with > the ngram filter. > > Thanks > -- View this message in context: > http://www.nabble.com/Confused-with-NGRAM-results-tp19202310p19202310.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]