> I suppose you could precompute the proximity associations by indexing > n-grams (in this case, called Lucene calls them shingles), such that there > is a single token in your index containing cheese_sandwich (effectively) > > doh, I see Grant already lead you in this direction. (sorry for the duplicate mail) on average its worked for me for some things like this.
although, I'll try to contribute something actually useful, and mention that if you use things like shingles, its good to consider modifying DefaultSimilarity, look at setDiscountOverlaps param. otherwise, i've measured cases where injecting additional tokens will cause more harm than good, because it has an adverse affect on lengthnorm. -- Robert Muir rcm...@gmail.com