> One simple way of doing this is maybe to write a wrapper for TermQuery
> that only returns docs with a Term Frequency  > X as far as I
> understand the question those terms don't have to be within a certain
> window right?

Correct. Terms can be anywhere in the document. I figured term frequencies 
might be involved, but wasn't sure how to actually do this.

> Hmmm... i would think the phrase query approach should work, but it's
> totally possible that there's something odd in the way phrase queries
> work that could cause a problem -- the best way to sanity test something
> like this is to try a really small self contained example that you can post
> for other people to try.

I've been able to reduce it pretty far, but I don't have a totally 
self-contained example yet. I haven't tried it out yet on a stock build of Solr 
(I'm using 3.2 with various patches). Right now I'm inserting a few documents 
with a text field that contains "dog dog dog", then repeatedly running q="dog 
dog dog dog"~1 with the queryResultCache disabled. The query is not giving me 
the same results each time (!!!). Sometimes all the documents are returned, 
sometimes a subset is returned, and sometimes no documents are returned.

So far I've traced it down to the "repeats" array in 
SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
elements in this array, the document may or may not match. I think the 
HashSet.toArray() call is to blame here, but I don't yet fully understand the 
expected behavior of the initPhrasePositions function...

-Michael

Reply via email to