I opted to use the following query to solve this problem, since it meets my requirements, for the time being.
+(cheese sandwich) "cheese sandwich"~slop This includes documents with one of more of the terms, but prefers those with an edit distance <= the slop. -----Original Message----- From: Joel Halbert <j...@su3analytics.com> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: scoring adjacent terms without proximity search Date: Sat, 31 Oct 2009 08:38:29 +0000 Thank you all for your suggestions, I shall have a little think about the best way forward, and report back if I do anything interesting that works well. In answer to Grant's question, why not use PhraseQuery, we do not want to have an artificial upper limit on the slop, i.e. we do want to include documents that might only have a subset of words from the phrase. (e.g. just cheese, or just sandwich, but not both). -----Original Message----- From: Robert Muir <rcm...@gmail.com> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: scoring adjacent terms without proximity search Date: Fri, 30 Oct 2009 16:04:03 -0400 > I suppose you could precompute the proximity associations by indexing > n-grams (in this case, called Lucene calls them shingles), such that there > is a single token in your index containing cheese_sandwich (effectively) > > doh, I see Grant already lead you in this direction. (sorry for the duplicate mail) on average its worked for me for some things like this. although, I'll try to contribute something actually useful, and mention that if you use things like shingles, its good to consider modifying DefaultSimilarity, look at setDiscountOverlaps param. otherwise, i've measured cases where injecting additional tokens will cause more harm than good, because it has an adverse affect on lengthnorm. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org