RE: DisjunctionMaxQuery and scoring

2012-04-19 Thread Uwe Schindler
Hi, Ah sorry, I misunderstood, you wanted to score the duplicate match lower! To achieve this, you have to change the coord function in your similarity/BooleanWeight used for this query. Either way: If you want a group of terms that get only one score if at least one of the terms match (SQL IN),

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 8:32 PM, David Murgatroyd wrote: > In contrast, I think the desire > is that one and only one of the terms in the document match those in the > BooleanQuery so that "Rich" would score higher than "Dick Rich", given > document length normalization. It's almost like a desire

RE: DisjunctionMaxQuery and scoring

2012-04-19 Thread Uwe Schindler
Hi, > I think > BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish the > desired "name IN (dick, rich)" scoring behavior. This is because (name:dick | > name:rich) with coord=false would score the 'document' "Dick Rich" higher > than "Rich" because the former has two term matches

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
FWIW, there seems to be an explain bug in 2.9.1 that is fixed in 3.6.0, so I'm no longer confused about the actual behavior. On Thu, Apr 19, 2012 at 8:32 PM, David Murgatroyd wrote: > [apologies for the earlier errant send] > > I think >  BooleanQuery bq = new BooleanQuery(false); > doesn't quit

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread David Murgatroyd
[apologies for the earlier errant send] I think BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish the desired "name IN (dick, rich)" scoring behavior. This is because (name:dick | name:rich) with coord=false would score the 'document' "Dick Rich" higher than "Rich" because the f

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 6:36 PM, Benson Margulies wrote: > I see why I'm so confused, but I think I need to construct a simpler test > case. > > My top-level BooleanQuery, which has disableCoord=false, has 22 > clauses. All but three are ordinary SHOULD TermQueries. the remainder > are a spanNear

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread David Murgatroyd
On Apr 19, 2012, at 6:36 PM, Benson Margulies wrote: > I see why I'm so confused, but I think I need to construct a simpler test > case. > > My top-level BooleanQuery, which has disableCoord=false, has 22 > clauses. All but three are ordinary SHOULD TermQueries. the remainder > are a spanNe

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
I see why I'm so confused, but I think I need to construct a simpler test case. My top-level BooleanQuery, which has disableCoord=false, has 22 clauses. All but three are ordinary SHOULD TermQueries. the remainder are a spanNear and a nested BooleanQuery, and an empty PhraseQuery (that's a bug).

Re: Two questions on RussianAnalyzer

2012-04-19 Thread Vladimir Gubarkov
Thank you Steven, I'll look into this On Fri, Apr 20, 2012 at 12:43 AM, Steven A Rowe wrote: > Hi Vladimir, > >> The most uncomfortable in new behaviour to me is that in past I used >> to search by subdomain like bbb.com: and have displayed results >> with www.bbb.com:, aaa.bbb.com:

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
On Thu, Apr 19, 2012 at 5:10 PM, Robert Muir wrote: > On Thu, Apr 19, 2012 at 5:05 PM, Benson Margulies > wrote: >> On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir wrote: >>> On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies >>> wrote: On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir wrote: >>

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 5:05 PM, Benson Margulies wrote: > On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir wrote: >> On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies >> wrote: >>> On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir wrote: On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies wr

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir wrote: > On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies > wrote: >> On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir wrote: >>> On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies >>> wrote: I am trying to solve a problem using DisjunctionMaxQuer

Re: Two questions on RussianAnalyzer

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 4:51 PM, Vladimir Gubarkov wrote: > So it's now imposible to find this document with query: "site.com". > I'm having an RSS subscription for that search, and now it's broken. > Just to point out, its not impossible, as i suggested before, if you were happy with the old tok

Re: Two questions on RussianAnalyzer

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 4:51 PM, Vladimir Gubarkov wrote: > Hmmm... I know this and I reindexed! > I'll try to explain the problem (fortunately, already solved by using > LUCENE_30) ones again: > When indexing with new analyzer the whole lexeme "some.cool.site.com" > goes to index, not 4 lexems "

Re: Two questions on RussianAnalyzer

2012-04-19 Thread Vladimir Gubarkov
Thank you Robert for detailed reply On Fri, Apr 20, 2012 at 12:37 AM, Robert Muir wrote: > On Thu, Apr 19, 2012 at 7:26 AM, Vladimir Gubarkov wrote: >> New analyzer: >> [aaa.bbb.com, , a, b, c, d'e, f, g, h, i, j, k, l_m, n, o, p, q, >> r, s, t, u, v, z, y, z] >> Old analyzer: >> [aaa, bbb,

RE: Two questions on RussianAnalyzer

2012-04-19 Thread Steven A Rowe
Hi Vladimir, > The most uncomfortable in new behaviour to me is that in past I used > to search by subdomain like bbb.com: and have displayed results > with www.bbb.com:, aaa.bbb.com: and so on. Now I have 0 > results. About domain names, see my response to a similar question today on

Re: Two questions on RussianAnalyzer

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 7:26 AM, Vladimir Gubarkov wrote: > New analyzer: > [aaa.bbb.com, , a, b, c, d'e, f, g, h, i, j, k, l_m, n, o, p, q, > r, s, t, u, v, z, y, z] > Old analyzer: > [aaa, bbb, com, , a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, > q, r, s, t, u, v, z, y, z] > > Please

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies wrote: > On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir wrote: >> On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies >> wrote: >>> I am trying to solve a problem using DisjunctionMaxQuery. >>> >>> >>> Consider a query like: >>> >>> a:b OR c:d OR e:

Re: Two questions on RussianAnalyzer

2012-04-19 Thread Vladimir Gubarkov
On Thu, Apr 19, 2012 at 7:57 PM, Uwe Schindler wrote: >> My questions are: 1) it this change is by design (not a mistake) and >> 2) is the only option to achieve old behaviour is to use >> Version.LUCENE_30 for creating analyzer? > > This is why this option is there! Right and it's great, but thi

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
Turning on disableCoord for a nested boolean query does not seem to change the overall maxCoord term as displayed in explain. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir wrote: > On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies > wrote: >> I am trying to solve a problem using DisjunctionMaxQuery. >> >> >> Consider a query like: >> >> a:b OR c:d OR e:f OR ... >> name:richard OR name:dick OR name:dickie OR name:rich ..

Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies wrote: > I am trying to solve a problem using DisjunctionMaxQuery. > > > Consider a query like: > > a:b OR c:d OR e:f OR ... > name:richard OR name:dick OR name:dickie OR name:rich ... > > At most, one of the richard names matches. So the match sco

DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
I am trying to solve a problem using DisjunctionMaxQuery. Consider a query like: a:b OR c:d OR e:f OR ... name:richard OR name:dick OR name:dickie OR name:rich ... At most, one of the richard names matches. So the match score gets dragged down by the long list of things that don't match, as the

RE: Two questions on RussianAnalyzer

2012-04-19 Thread Uwe Schindler
> My questions are: 1) it this change is by design (not a mistake) and > 2) is the only option to achieve old behaviour is to use > Version.LUCENE_30 for creating analyzer? This is why this option is there! - To unsubscribe, e-m