On Wednesday 31 October 2007 14:51:12 Tobias Hill wrote: > My documents all hava a field with variables number of terms > (but rather few): > Doc1.field = "foo bar gro" > Doc2.field = "foo bar gro mot slu" > Now I would like to search using the terms "foo bar gro" > > Problem 1: > I like to express that at least any two of the three terms > must match. Do I have to construct this clause myself - i.e. > "(foo & bar) | (foo & gro) | (bar & gro)", or is there some > clever way to do this?
BooleanQuery.setMinimumNumberShouldMatch(int) does this, have a look at the javadocs for the details. > > Problem 2: > I like to express that if the doc.field has too many terms > that wasn't matched it should not be included at all in the > result. In the example above Doc2 might have too many > terms that was not matched to be included in the result. > Is this kind of query possible, and how? > > The general case: > I want to find those docs that has X% of the search terms > matched and that the acctual match covers at least Y% of > the available terms on the document. This Y% is not directly possible, but I would expect the default document score to correlate reasonably well with coverage. In case you want an exact Y% cutoff, you'll run into the fact that the field norm (the inverse square root of the field length) is encoded in only 8 bits, which is rather course. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]