Re: Scoring over Multiple Indexes

2015-10-22 Thread McKinley, James T
Hi Scott, I don't know your reasons for splitting your index up, but assuming you want to do that and then merge the search results back together I think you could re-unify the term document frequencies across all your indexes and then extend IndexSearcher and override termStatistics and collec

Re: Scoring over Multiple Indexes

2015-10-22 Thread Erick Erickson
bq: Given that the content loaded for these indexes represents individually curated terminologies, I think we can argue to our users that what comes from combined queries over the latter is as meaningful in it¹s own right as those run over the monolithic index If one assumes that the individually

Re: Scoring over Multiple Indexes

2015-10-22 Thread Bauer, Herbert S. (Scott)
Thanks for your reply. We¹ve recently moved from a single large index to multiple indexes. Given that the content loaded for these indexes represents individually curated terminologies, I think we can argue to our users that what comes from combined queries over the latter is as meaningful in it¹s

Re: ConjunctionScorer access

2015-10-22 Thread Alan Woodward
If you're using 5.3, you can wrap everything with a PayloadScoreQuery. Before that you'll need to use PayloadTermQuery or PayloadNearQuery, but I'd advise upgrading as you'll get better performance and slightly more sane APIs. Alan Woodward www.flax.co.uk On 22 Oct 2015, at 16:53, Sheng wrote

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
Alan, Thanks - that indeed sounds promising. I can use a `SpanPayloadCheckQuery` to wrap around a `SpanTermQuery`. Now I still want the payload scoring to work, so I should use a Payload Query of some sort. Is there a way that I can wrap a `SpanPayloadCheckQuery` into a payload query. I think I sh

Re: Scoring over Multiple Indexes

2015-10-22 Thread Erick Erickson
In a word, no. At least not that I've heard of. "normalizing scores" is one of those things that sounds reasonable on the surface, but is really meaningless. Scores don't really _tell_ you anything about the abstract "goodness" of a doc, they just tell you that doc1 is likely better than doc2 _with

Scoring over Multiple Indexes

2015-10-22 Thread Bauer, Herbert S. (Scott)
We have a test case that boosts a set of terms. Something along the lines of “term1^2 AND term2^3 AND term3^4 and this query runs over a two content distinct indexes. Our expectation is that the terms would be returned to us as term3, term2 and term1. Instead we get something along the lines

Re: ConjunctionScorer access

2015-10-22 Thread Alan Woodward
Maybe instead of hacking BooleanWeight, you should use a version of SpanPayloadCheckQuery? There isn't anything that combines checking and scoring for payloads at the moment, but I don't think it would be too difficult to write one. Alan Woodward www.flax.co.uk On 22 Oct 2015, at 16:21, Shen

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
Uwe, Problem is how can we get an instance of `ConjunctionScorer` in the first place. Yes I can use reflection to get an instance of it. However one of the constructor parameters for this class is an array of subscorers which comes from the `BooleanWeight`. Like I said, if I hack that as well, the

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
Péteri, The problem is if A or B should be in "excluded_field" is a posteriori rather than a priori knowledge. I want the search A `and` B does not return the document as long as one of the term has score 0, but before search happens, I don't know if any of them should be "excluded" at all. This i

RE: ConjunctionScorer access

2015-10-22 Thread Uwe Schindler
Hi, How about using delegate.getChildren() [delegate is the ConjunctionScorer you wrap] in your FilterScorer? By that you get a List of all ChildScorer instances with "MUST" as type and a reference to the Scorer itsself. You can use those in the FilterScorer's score() method to get subscores fo

Re: ConjunctionScorer access

2015-10-22 Thread András Péteri
Going by the example, it looks like you could do something like this: 1) Use the existing field for adding terms with payloads as before ("payload_field"); 2) Introduce another field ("excluded_field"), adding only those terms where you expect a score of zero to be returned (based on the payload);

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
That's the problem right - none of them are public, and even neither is the constructor of `ConjunctionScorer`. Moreover, `ConjunctionScorer` needs access to list of sub-scorers to emit the doc and score. Information like this has to come from the `BooleanWeight`, which is another hack if I want to

Re: ConjunctionScorer access

2015-10-22 Thread Alan Woodward
You should be able to use a FilterScorer that wraps a ConjunctionScorer and overrides score(). Alan Woodward www.flax.co.uk On 22 Oct 2015, at 13:43, Sheng wrote: > Thanks for the reply and suggestion. If I search for term A and term B with > a BooleanQuery in Lucene, normally Lucene returns d

RE: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Uwe Schindler
> What is the meaning of "the Unicode Policeman" ? Robert Muir :-) Uwe > Thanks, > Ahmet > > On Thursday, October 22, 2015 2:59 PM, Uwe Schindler > wrote: > > > > Hi, > > > > >> Setting aside the fact that Character.toLowerCase is already > > >> dubious in some locales (e.g. Turkish), > >

Re: ConjunctionScorer access

2015-10-22 Thread Sheng
Thanks for the reply and suggestion. If I search for term A and term B with a BooleanQuery in Lucene, normally Lucene returns documents that have a match of both A and B. Now I am using payload to vary the scores w.r.t search of term A and search of term B, so it is possible for example a document

Re: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Ahmet Arslan
Hi Uwe, What is the meaning of "the Unicode Policeman" ? Thanks, Ahmet On Thursday, October 22, 2015 2:59 PM, Uwe Schindler wrote: Hi, > >> Setting aside the fact that Character.toLowerCase is already dubious > >> in some locales (e.g. Turkish), > > > > This is not true. Character.toLower

Looking up multiple words in multiple fields, each word matching in at least one field

2015-10-22 Thread Clemens Wyss DEV
Say our index has (documents with) three fields "f1", "f2" and "f3" and I want to find all documents matching "foo" and "bar" in any combination of the three fields. 1)The more words that match, the higher its ranking. So it is not really a strict AND-query... 2)The more words that match in a si

RE: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Uwe Schindler
Hi, > >> Setting aside the fact that Character.toLowerCase is already dubious > >> in some locales (e.g. Turkish), > > > > This is not true. Character.toLowerCase() works locale-independent. > > It is only String.toLowerCase that works using default locale. So you mean the opposite. You wanted t

Re: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Dawid Weiss
> LowerCaseFilter will not handle that. So whereas it is "safe" for > English hard-coded strings, it isn't safe for all fields you might > index in general. This filter is a "safe" fallback that works identically regardless of the locale you have on your computer (or on the server). This, I believ

Re: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Trejkaz
On Thu, Oct 22, 2015 at 7:05 PM, Uwe Schindler wrote: > Hi, > >> Setting aside the fact that Character.toLowerCase is already dubious in some >> locales (e.g. Turkish), > > This is not true. Character.toLowerCase() works locale-independent. > It is only String.toLowerCase that works using default

Re: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Dawid Weiss
Well, practice says there are no such cases... for (int cp = Character.MIN_CODE_POINT; cp < Character.MAX_CODE_POINT; cp++) { int c1 = Character.charCount(cp); int c2 = Character.charCount(Character.toUpperCase(cp)); int c3 = Character.charCount(Characte

Re: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Dawid Weiss
I think the issue here is what happens if an "uppercase" codepoint requires a surrogate pair and the lowercase counterpart does not -- then the index variable would indeed be screwed. Dawid On Thu, Oct 22, 2015 at 10:05 AM, Uwe Schindler wrote: > Hi, > > > Setting aside the fact that Character.

RE: ConjunctionScorer access

2015-10-22 Thread Uwe Schindler
Hi, Those are internal classes and not to be extended (not only the constructor is pkg-private, the whole class is: https://goo.gl/5WyLYz)! Scorers follow the delegator pattern. If you want to modify the behaviour of a Scorer, create a delegator scorer (e.g. some Filtering Scorer) and change it

RE: Dubious stuff spotted in LowerCaseFilter

2015-10-22 Thread Uwe Schindler
Hi, > Setting aside the fact that Character.toLowerCase is already dubious in some > locales (e.g. Turkish), This is not true. Character.toLowerCase() works locale-independent. It is only String.toLowerCase that works using default locale. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213