Hi Scott,
I don't know your reasons for splitting your index up, but assuming you want to
do that and then merge the search results back together I think you could
re-unify the term document frequencies across all your indexes and then extend
IndexSearcher and override termStatistics and collec
bq: Given that the content loaded for these indexes
represents individually curated terminologies, I think we can argue to our
users that what comes from combined queries over the latter is as
meaningful in it¹s own right as those run over the monolithic index
If one assumes that the individually
Thanks for your reply. We¹ve recently moved from a single large index to
multiple indexes. Given that the content loaded for these indexes
represents individually curated terminologies, I think we can argue to our
users that what comes from combined queries over the latter is as
meaningful in it¹s
If you're using 5.3, you can wrap everything with a PayloadScoreQuery. Before
that you'll need to use PayloadTermQuery or PayloadNearQuery, but I'd advise
upgrading as you'll get better performance and slightly more sane APIs.
Alan Woodward
www.flax.co.uk
On 22 Oct 2015, at 16:53, Sheng wrote
Alan,
Thanks - that indeed sounds promising. I can use a `SpanPayloadCheckQuery`
to wrap around a `SpanTermQuery`. Now I still want the payload scoring to
work, so I should use a Payload Query of some sort. Is there a way that I
can wrap a `SpanPayloadCheckQuery` into a payload query. I think I sh
In a word, no. At least not that I've heard of. "normalizing scores"
is one of those things
that sounds reasonable on the surface, but is really meaningless.
Scores don't really
_tell_ you anything about the abstract "goodness" of a doc, they just
tell you that
doc1 is likely better than doc2 _with
We have a test case that boosts a set of terms. Something along the lines of
“term1^2 AND term2^3 AND term3^4 and this query runs over a two content
distinct indexes. Our expectation is that the terms would be returned to us as
term3, term2 and term1. Instead we get something along the lines
Maybe instead of hacking BooleanWeight, you should use a version of
SpanPayloadCheckQuery? There isn't anything that combines checking and scoring
for payloads at the moment, but I don't think it would be too difficult to
write one.
Alan Woodward
www.flax.co.uk
On 22 Oct 2015, at 16:21, Shen
Uwe,
Problem is how can we get an instance of `ConjunctionScorer` in the first
place. Yes I can use reflection to get an instance of it. However one of
the constructor parameters for this class is an array of subscorers which
comes from the `BooleanWeight`. Like I said, if I hack that as well, the
Péteri,
The problem is if A or B should be in "excluded_field" is a
posteriori rather than a priori knowledge.
I want the search A `and` B does not return the document as long as one of
the term has score 0, but before search happens, I don't know if any of
them should be "excluded" at all.
This i
Hi,
How about using delegate.getChildren() [delegate is the ConjunctionScorer you
wrap] in your FilterScorer? By that you get a List of all ChildScorer instances
with "MUST" as type and a reference to the Scorer itsself. You can use those in
the FilterScorer's score() method to get subscores fo
Going by the example, it looks like you could do something like this:
1) Use the existing field for adding terms with payloads as before
("payload_field");
2) Introduce another field ("excluded_field"), adding only those terms
where you expect a score of zero to be returned (based on the payload);
That's the problem right - none of them are public, and even neither is the
constructor of `ConjunctionScorer`. Moreover, `ConjunctionScorer` needs
access to list of sub-scorers to emit the doc and score. Information like
this has to come from the `BooleanWeight`, which is another hack if I want
to
You should be able to use a FilterScorer that wraps a ConjunctionScorer and
overrides score().
Alan Woodward
www.flax.co.uk
On 22 Oct 2015, at 13:43, Sheng wrote:
> Thanks for the reply and suggestion. If I search for term A and term B with
> a BooleanQuery in Lucene, normally Lucene returns d
> What is the meaning of "the Unicode Policeman" ?
Robert Muir :-)
Uwe
> Thanks,
> Ahmet
>
> On Thursday, October 22, 2015 2:59 PM, Uwe Schindler
> wrote:
>
>
>
> Hi,
>
>
> > >> Setting aside the fact that Character.toLowerCase is already
> > >> dubious in some locales (e.g. Turkish),
> >
Thanks for the reply and suggestion. If I search for term A and term B with
a BooleanQuery in Lucene, normally Lucene returns documents that have a
match of both A and B. Now I am using payload to vary the scores w.r.t
search of term A and search of term B, so it is possible for example a
document
Hi Uwe,
What is the meaning of "the Unicode Policeman" ?
Thanks,
Ahmet
On Thursday, October 22, 2015 2:59 PM, Uwe Schindler wrote:
Hi,
> >> Setting aside the fact that Character.toLowerCase is already dubious
> >> in some locales (e.g. Turkish),
> >
> > This is not true. Character.toLower
Say our index has (documents with) three fields "f1", "f2" and "f3" and I want
to find all documents matching "foo" and "bar" in any combination of the three
fields.
1)The more words that match, the higher its ranking. So it is not really a
strict AND-query...
2)The more words that match in a si
Hi,
> >> Setting aside the fact that Character.toLowerCase is already dubious
> >> in some locales (e.g. Turkish),
> >
> > This is not true. Character.toLowerCase() works locale-independent.
> > It is only String.toLowerCase that works using default locale.
So you mean the opposite. You wanted t
> LowerCaseFilter will not handle that. So whereas it is "safe" for
> English hard-coded strings, it isn't safe for all fields you might
> index in general.
This filter is a "safe" fallback that works identically regardless of
the locale you
have on your computer (or on the server). This, I believ
On Thu, Oct 22, 2015 at 7:05 PM, Uwe Schindler wrote:
> Hi,
>
>> Setting aside the fact that Character.toLowerCase is already dubious in some
>> locales (e.g. Turkish),
>
> This is not true. Character.toLowerCase() works locale-independent.
> It is only String.toLowerCase that works using default
Well, practice says there are no such cases...
for (int cp = Character.MIN_CODE_POINT; cp <
Character.MAX_CODE_POINT; cp++) {
int c1 = Character.charCount(cp);
int c2 = Character.charCount(Character.toUpperCase(cp));
int c3 = Character.charCount(Characte
I think the issue here is what happens if an "uppercase" codepoint requires
a surrogate pair and the lowercase counterpart does not -- then the index
variable would indeed be screwed.
Dawid
On Thu, Oct 22, 2015 at 10:05 AM, Uwe Schindler wrote:
> Hi,
>
> > Setting aside the fact that Character.
Hi,
Those are internal classes and not to be extended (not only the constructor is
pkg-private, the whole class is: https://goo.gl/5WyLYz)! Scorers follow the
delegator pattern. If you want to modify the behaviour of a Scorer, create a
delegator scorer (e.g. some Filtering Scorer) and change it
Hi,
> Setting aside the fact that Character.toLowerCase is already dubious in some
> locales (e.g. Turkish),
This is not true. Character.toLowerCase() works locale-independent. It is only
String.toLowerCase that works using default locale.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213
25 matches
Mail list logo