> One simple way of doing this is maybe to write a wrapper for TermQuery > that only returns docs with a Term Frequency > X as far as I > understand the question those terms don't have to be within a certain > window right?
Correct. Terms can be anywhere in the document. I figured term frequencies might be involved, but wasn't sure how to actually do this. > Hmmm... i would think the phrase query approach should work, but it's > totally possible that there's something odd in the way phrase queries > work that could cause a problem -- the best way to sanity test something > like this is to try a really small self contained example that you can post > for other people to try. I've been able to reduce it pretty far, but I don't have a totally self-contained example yet. I haven't tried it out yet on a stock build of Solr (I'm using 3.2 with various patches). Right now I'm inserting a few documents with a text field that contains "dog dog dog", then repeatedly running q="dog dog dog dog"~1 with the queryResultCache disabled. The query is not giving me the same results each time (!!!). Sometimes all the documents are returned, sometimes a subset is returned, and sometimes no documents are returned. So far I've traced it down to the "repeats" array in SloppyPhraseScorer.initPhrasePositions() - depending on the order of the elements in this array, the document may or may not match. I think the HashSet.toArray() call is to blame here, but I don't yet fully understand the expected behavior of the initPhrasePositions function... -Michael
