Hi,
this is the normal way to do this: use a filter or constant score query
to do the matcing and use disjunctive scoring as a long chain of
"should" clauses.
Uwe
Am 21.07.2023 um 02:35 schrieb Marc D'Mello:
Hi all,
I'm an engineer on Amazon Product Search and I
Hi all,
I'm an engineer on Amazon Product Search and I've recently come upon a
situation where I've required conjunctive matching but disjunctive scoring.
As a concrete example, let's say I have a query like this:
(+title:"a" +title:"b" +title:"
gt; > You're saying that you're storing the type of token as part of the
> > > term
> > > > > > frequency. This doesn't sound like something that would play well
> > > with
> > > > > > dynamic pruning, so I wonder if this is the reason wh
> > frequency. This doesn't sound like something that would play well
> > with
> > > > > dynamic pruning, so I wonder if this is the reason why you are
> seeing
> > > > > slower queries. But since you mentioned custom term queries, maybe
> > you
&
if this is the reason why you are seeing
> > > > slower queries. But since you mentioned custom term queries, maybe
> you
> > > > never actually took advantage of dynamic pruning?
> > > >
> > > > On Tue, Jun 20, 2023 at 10:30 AM Vimal Jain
> wr
ok advantage of dynamic pruning?
> > >
> > > On Tue, Jun 20, 2023 at 10:30 AM Vimal Jain wrote:
> > >
> > > > Ok , sorry , I realized that I need to provide more context.
> > > > So we used to create a lucene query which consisted of custom term
&
context.
> > > So we used to create a lucene query which consisted of custom term
> > queries
> > > for different fields and based on the type of field , we used to
> assign a
> > > boost that would be used in scoring.
> > > Now we want to get rid off dif
ed that I need to provide more context.
> > So we used to create a lucene query which consisted of custom term
> queries
> > for different fields and based on the type of field , we used to assign a
> > boost that would be used in scoring.
> > Now we want to get rid off diff
create a lucene query which consisted of custom term queries
> for different fields and based on the type of field , we used to assign a
> boost that would be used in scoring.
> Now we want to get rid off different fields and instead of creating
> multiple term queries , we create only
Ok , sorry , I realized that I need to provide more context.
So we used to create a lucene query which consisted of custom term queries
for different fields and based on the type of field , we used to assign a
boost that would be used in scoring.
Now we want to get rid off different fields and
i,
> > I want to understand if fetching the term frequency of a term during
> > scoring is relatively cpu bound operation ?
> > Context - I am storing custom term frequency during indexing and later
> > using it for scoring during query execution time ( in Scorer'
Note - i am using lucene 7.7.3
*Thanks and Regards,*
*Vimal Jain*
On Tue, Jun 20, 2023 at 12:26 PM Vimal Jain wrote:
> Hi,
> I want to understand if fetching the term frequency of a term during
> scoring is relatively cpu bound operation ?
> Context - I am storing custom term freq
Hi,
I want to understand if fetching the term frequency of a term during
scoring is relatively cpu bound operation ?
Context - I am storing custom term frequency during indexing and later
using it for scoring during query execution time ( in Scorer's score()
method ). I noticed a performance
Note that Lucene automatically disables scoring already when scores are not
needed. E.g. queries that compute the top-k hits by score will definitely
compute scores, but if you are just counting the number of matches of a
query or aggregations, then Lucene skips scoring entirely already.
Is there
I'd rather agree with Uwe, but you can plug BooleanSimilarity just to check
it out.
On Mon, Jul 11, 2022 at 6:01 PM Mohammad Kasaei
wrote:
> Hello
>
> I have a question. Is it possible to completely disable scoring in lucene?
>
> Detailed description:
> I have an index
No that's the only way to do it. The function call does not cost
overheads because it is optimized away by the runtime.
Uwe
Am 10.07.2022 um 11:34 schrieb Mohammad Kasaei:
Hello
I have a question. Is it possible to completely disable scoring in lucene?
Detailed description:
I have an
Hello
I have a question. Is it possible to completely disable scoring in lucene?
Detailed description:
I have an index in elasticsearch and it contains big shards (every shard
about 500m docs) so a nano second of time spent on scoring every document
in any shard causes a few second delay in the
commits, and if you are indexing across multiple threads. We
> found this can help reduce the number of segments, and the variability
> in the number of segments. I don't know if that is truly a root cause
> of your performance problems here though.
>
> Regarding scoring cos
here though.
Regarding scoring costs -I don't think creating dummy Weight and
Scorer will do what you think - Scorers are doing matching in fact as
well as scoring. You won't get any results if you don't have any real
Scorer.
I *think* that setting needsScores() to false should
, performance is significantly better.
When we turn on realtime updates, due to accumulation of segments - CPU
utilization by lucene goes up by at least *3X* [based on profiling].
b) A profile shows that the vast majority of time is being spent in
scoring methods even though we are setting *needsScores() to
same number of count each?
That would basically be a cosine similarity between the two documents, I think.
TK
On 5/28/21 6:27 PM, Robert Muir wrote:
See https://cwiki.apache.org/confluence/display/LUCENE/ScoresAsPercentages
which has some broken nabble links, but is still valid.
TLDR: Scoring
See https://cwiki.apache.org/confluence/display/LUCENE/ScoresAsPercentages
which has some broken nabble links, but is still valid.
TLDR: Scoring just doesn't work the way you think. Don't try to
interpret it as an absolute value, it is a relative one.
On Fri, May 28, 2021 at 1:36
I'd like to have suggestions on changing the scoring algorithm
of MoreLikeThis.
When I feed the identical string as the content of a document in the index
to MoreLikeThis.like("field", new StringReader(docContent)),
I get a score less than 1.0 (0.944 in one of my test cases) that
I think you'll need a SpanQuery with the inOrder flag set:
https://lucene.apache.org/core/8_8_1/core/org/apache/lucene/search/spans/SpanNearQuery.html
Charlie
On 17/03/2021 10:30, Vlad Smirnovskiy wrote:
Hello!
I`d like to do something like that: When I add a document and some text is
going wi
Hello!
I`d like to do something like that: When I add a document and some text is
going with (e.g.) quotes it should mean that this text has to be exactly in
the query. Better with an examples -
text: green "blue apple" juice
query : blue apple - result: hit.
query : blue apple juice - result: h
just the most
>frequent unique fuzzy match in each document.
>
>Ideally I'd like to use a built in mechanism for achieving this, but if
>it's not available, a way to extend the BooleanQuery, BooleanWeight,
>and/or
>BooleanScorer classes to have slightly different scor
but if
it's not available, a way to extend the BooleanQuery, BooleanWeight, and/or
BooleanScorer classes to have slightly different scoring logic but
otherwise function exactly the same would also work, but all of those are
either final classes or have no public constructor, effectively making it
imp
, you're likely just inflating those title
matches even more (since a title match is probably highly correlated with a
body match). (The DisjunctionMaxQuery also has a an optional
"tieBreakerMultiplier" property that you can use to weight the scoring
somewhere between pure max and pure sum -
Hi,
I have a question regarding how Lucene computes document similarities from
field similarities.
Lucene's scoring documentation mentions that scoring works on fields and
combines the results to return documents. I'm assuming fields are given
scores, and those scores are simply a
^0.56]
Thanks
On 6/26/19 10:44 AM, baris.ka...@oracle.com wrote:
Yes, i know that feature but so far it did not help me much but
i am still looking into that.
Thanks
On 6/26/19 2:41 AM, Adrien Grand wrote:
You can use IndexSearcher#explain to see how scores are computed.
On Wed, Jun 26, 201
AM, Adrien Grand wrote:
You can use IndexSearcher#explain to see how scores are computed.
On Wed, Jun 26, 2019 at 12:48 AM wrote:
Hi,-
i really want to know why the scoring works this way: search
String is
either MAINO or MAINS: MAIN appears as the 276th entry in the results.
NEW HAMPS
scoring works this way: search String is
either MAINO or MAINS: MAIN appears as the 276th entry in the results.
NEW HAMPSHIRE in results: city="NASHUA" municipality="HILLSBOROUGH"
region="NEW HAMPSHIRE" country="UNITED STATES" in the 0 th result
NEW HAMPSHIR
You can use IndexSearcher#explain to see how scores are computed.
On Wed, Jun 26, 2019 at 12:48 AM wrote:
>
> Hi,-
>
> i really want to know why the scoring works this way: search String is
> either MAINO or MAINS: MAIN appears as the 276th entry in the results.
>
> NEW
Hi,-
i really want to know why the scoring works this way: search String is
either MAINO or MAINS: MAIN appears as the 276th entry in the results.
NEW HAMPSHIRE in results: city="NASHUA" municipality="HILLSBOROUGH"
region="NEW HAMPSHIRE" country="UN
g/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>
> Best regards
>
>
> On 7/17/18 1:01 PM, baris.ka...@oracle.com wrote:
> > Hi,-
> >
> > is there a way to diminish the tf(t in d) component to 1? i dont want
> > the number of ti
You could use IndexSearcher#explain, which tells you how the score of a
document is computed.
Le mar. 17 juil. 2018 à 19:06, a écrit :
> Hi,-
>
> how can i check the contributions from different fields indexed in the
> hits doc's score?
>
> Best regards
>
>
> --
Hi,-
how can i check the contributions from different fields indexed in the
hits doc's score?
Best regards
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lu
the number of times a word appears to affect the scoring for my app.
Best regards
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Hi,-
is there a way to diminish the tf(t in d) component to 1? i dont want
the number of times a word appears to affect the scoring for my app.
Best regards
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
modified but the order of results is pretty much the same.
what happens is that when part of the search string is found on both fields
then those entries are hit first since Lucene scoring takes number of
occurrences as dominant in scoring.
But i want the search string to be fully-matched with the
Thank's a lot!
On Mon, Nov 20, 2017 at 11:22 PM, Adrien Grand wrote:
> Hi Vadim,
>
> Le jeu. 16 nov. 2017 à 18:09, Vadim Gindin a écrit
> :
>
> > 1. I would like to use my custom scoring algorithm. Is it make sense to
> use
> > Lucene with other scoring algor
Hi Vadim,
Le jeu. 16 nov. 2017 à 18:09, Vadim Gindin a écrit :
> 1. I would like to use my custom scoring algorithm. Is it make sense to use
> Lucene with other scoring algorithm? What is the best way for that -
> implement Similarity and own Queries?
>
It really depends what y
Hello
1. I would like to use my custom scoring algorithm. Is it make sense to use
Lucene with other scoring algorithm? What is the best way for that -
implement Similarity and own Queries?
2. I'm reasearching Elasticsearch/Lucene capabilities. Elastichsearch
contains request parameter &qu
duplicate documents can
sometimes report score values that differ considerable for the supposedly
duplicate content?
Searching through some of the older Lucene mail archives I did notice what
I believe to be discussions concerning development test failures having to
due with unexpected scoring
distance for filtering purposes. And then again i
>need
>the distance for scoring purposes. I also need the distance for display
>purposes and i display some 100 results. So are you sayings its still
>okay
>to compute the distance twice here once for scoring and once for
>displa
Okay say i need the distance for filtering purposes. And then again i need
the distance for scoring purposes. I also need the distance for display
purposes and i display some 100 results. So are you sayings its still okay
to compute the distance twice here once for scoring and once for display
Sorry I just saw your other message that has a bit more information.
Actually you do not need the distance for displaying purposes but both for
filtering and custom scoring. That said, I think recomputing the distances
is still the way to go. Geo-distance filters have optimizations that allow
them
am using custom score provider for scoring lucene documents manually. I
> am
> doing many calculations in custom score provider to calculate the score.
> For
> example on of them is distance. So now once the scoring is done i would
> like
> to know that distance as well. Instead of co
I am using custom score provider for scoring lucene documents manually. I am
doing many calculations in custom score provider to calculate the score. For
example on of them is distance. So now once the scoring is done i would like
to know that distance as well. Instead of computing it again cant i
fwiw https://issues.apache.org/jira/browse/LUCENE-5867 is going to be
released soon.
On Mon, Jan 9, 2017 at 2:17 PM, Rajnish kamboj
wrote:
> My application does not require scoring/ranking. All data is equally
> important for me.
>
> Search query can return any documents mat
In most cases, it should.
Test it and find out and report back :)
Mike McCandless
http://blog.mikemccandless.com
On Mon, Jan 9, 2017 at 10:07 AM, Rajnish kamboj
wrote:
> Thanks for quick responses..
> I will try the approach..
>
> Does bypassing scoring increases search perf
Thanks for quick responses..
I will try the approach..
Does bypassing scoring increases search performance also?
Regards
Rajnish
On Monday, January 9, 2017, Ian Lea wrote:
> oal.search.ConstantScoreQuery?
>
> "A query that wraps another query and simply returns a constant sc
aher Galal
wrote:
> Hi,
>
> What about writing your own scoring that just give a value of 1 to all the
> documents that are hits?
>
> On Mon, Jan 9, 2017 at 12:17 PM, Rajnish kamboj
> wrote:
>
> > My application does not require scoring/ranking. All data is equally
&g
ire scoring/ranking. All data is equally
> important for me.
>
> Search query can return any documents matching search criteria.
>
> So, Is there a way to completely disable scoring/ranking altogether?
> OR Is there a better solution to it.
Hi,
What about writing your own scoring that just give a value of 1 to all the
documents that are hits?
On Mon, Jan 9, 2017 at 12:17 PM, Rajnish kamboj
wrote:
> My application does not require scoring/ranking. All data is equally
> important for me.
>
> Search query can return a
My application does not require scoring/ranking. All data is equally
important for me.
Search query can return any documents matching search criteria.
So, Is there a way to completely disable scoring/ranking altogether?
OR Is there a better solution to it.
Regards
Rajnish
Waiting for an explanation for my query. Thank you very much.
On Tue, Dec 20, 2016 at 10:51 PM, Dwaipayan Roy
wrote:
> Hello,
>
> Can anyone help me understand the scoring function in the
> LMJelinekMercerSimilarity class?
>
> The scoring function in LMJelinekMercerSimilar
https://doi.org/10.3115/981574.981579
On 12/20/2016 12:21 PM, Dwaipayan Roy wrote:
Hello,
Can anyone help me understand the scoring function in the
LMJelinekMercerSimilarity class?
The scoring function in LMJelinekMercerSimilarity is shown below
Hello,
Can anyone help me understand the scoring function in the
LMJelinekMercerSimilarity class?
The scoring function in LMJelinekMercerSimilarity is shown below:
float score = stats.getTotalBoost() *
(float)Math.log(1 + ((1 - lambda
I have a simple setup with IndexSearcher, QueryParser, SimpleAnalyzer.
Running some queries I recognised that a query with more than one term
returns a different ScoreDoc[i].score than shown in explain query
statement. Apparently it is the score shown in explain divided by the
number of search term
ers
Doug
On Saturday, November 21, 2015, Victor Makarenkov wrote:
> Hi everybody!
>
> I would appreciate if you can refer me to some *example *or explanation of
> how to change the scoring function of lucene.
>
> I would expect 2 options:
>
> 1. changing some configuration
Hi everybody!
I would appreciate if you can refer me to some *example *or explanation of
how to change the scoring function of lucene.
I would expect 2 options:
1. changing some configuration, so the ranking function becomes , say Okapi
BM 25 instead of standard similarity
2. Is there any
I'm fairly new to Elasticsearch and Lucene. I quickly went through the
Elasticsearch definitive guide and was able to understand how the scoring is
calculated for boolean, term and multi term queries. The basic weighting is
TF-IDF and scoring is based on custom VSM. Depending on query constru
ering from the low statistics problem Erick
described. We use an FST (see org.apache.lucene.util.fst.Builder) to hold the
stats in memory so that the lookups are fast.
Jim
From: Erick Erickson
Sent: 22 October 2015 15:15
To: java-user
Subject: Re: Scoring
gt;> We have a test case that boosts a set of terms. Something along the
>>>lines of ³term1^2 AND term2^3 AND term3^4 and this query runs over a two
>>>content distinct indexes. Our expectation is that the terms would be
>>>returned to us as term3, term2 and term1. I
terms would be
>>returned to us as term3, term2 and term1. Instead we get something
>>along the lines of term3, term1 and term2. I realize from a number of
>>postings that this is the result of the scoring methods action taking
>>place within an individual index rathe
long the lines of
> term3, term1 and term2. I realize from a number of postings that this is the
> result of the scoring methods action taking place within an individual index
> rather than against several indexes. At the same time I don’t see a lot of
> solutions offered. Is there an ou
of term3,
term1 and term2. I realize from a number of postings that this is the result
of the scoring methods action taking place within an individual index rather
than against several indexes. At the same time I don’t see a lot of solutions
offered. Is there an out of the box solution to
Hi all,
I want to take into account the absolute position of the term for the
score calculation.
I found many threads that deal with this issue, and the answer is often:
"use SpanFirstQuery".
The problem with this approach is that it is too "boolean" for me (the
document matches the spanfirstq
really be removed and the following
document-specific score should be added to the document score after the
term-scoring part (unless I am missing some background scoring that is
going on in Lucene):
+ queryLen * Math.log(mu / (docLen + mu))
Therefore, my question is as follows:
Where in lucene
tin query of this sort in Lucene, I've searched for
solutions, this issue has been asked about. I used the approach suggested
here
http://stackoverflow.com/questions/28565090/scoring-results-of-automatonquery
<http://stackoverflow.com/questions/2631206/lucene-query-bla-match-words-that-start-wi
ed
to retrieve them by a query (so using search), but I don't need any scoring
nor keeping the documents in any order.
When profiling the application, I saw that for my tests, my entire search
takes about 2.4 seconds, and BulkScorer takes 0.4 seconds. So I figured
that without scoring, I would
Hi all,
I'm doing some analytics with a custom Collector on a fairly large number
of searchresults (+-100.000, all the hits that return from a query). I need
to retrieve them by a query (so using search), but I don't need any scoring
nor keeping the documents in any order.
When pro
ent: Monday, January 26, 2015 11:49 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Absolute term position in scoring
>>
>> Hello!
>>
>> I'd like to ask if this approach: construct a complex query consisting of a
>> boosted "specialized&quo
o: java-user@lucene.apache.org
> Subject: Re: Absolute term position in scoring
>
> Hello!
>
> I'd like to ask if this approach: construct a complex query consisting of a
> boosted "specialized" part and an "ordinary" part with no boost, - doesn't
t of the document.
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, Jan 25, 2015 at 5:44 PM, Luis A Lastras wrote:
>
>> Thanks I didn't know about SpanFirstQuery. I can likely get something
>> going with that. I was still hoping that we co
wrote:
> Thanks I didn't know about SpanFirstQuery. I can likely get something
> going with that. I was still hoping that we could affect the scoring
> formula with the position itself, but maybe this is not feasible.
>
> Luis
>
>
>
>
Thanks I didn't know about SpanFirstQuery. I can likely get something going
with that. I was still hoping that we could affect the scoring formula with
the position itself, but maybe this is not feasible.
Maybe SpanFirstQuery?
Mike McCandless
http://blog.mikemccandless.com
On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras wrote:
> Is it possible to incorporate in Lucene's scoring function the position of
> a matching term (say as measured from the top of the document). The
> scena
Is it possible to incorporate in Lucene's scoring function the position of
a matching term (say as measured from the top of the document). The
scenario is, if the set of documents tend to lk about the most important
stuff at the beginning of the document, then we would like to give
preferen
Dear lucene users,
we are using lucene(4.6) MultiReader for different indexes and for
performance reasons i am going to replace it with normal Reader.
But we need to keep the scoring similar with MultiReader. and as
expected when we switch to normal Reader scoring for each result is not
I have an idea for something I'm calling grouped scoring, and I want to
know if anybody has already done anything like this.
The idea comes from the problem that in your search results you'd like
to show only one or a small number of items from each group: for example
on google.com
Hi;
At TFIDFSimilarity class documentaton says that about return value of
scorePayload():
*An implementation dependent float to be used as a scoring factor*
However when I read here:
http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
I don'
With multiple fields of the same name vs a single field I doubt you'd
be able to tell the difference in performance or matching or scoring
in normal use. There may be some matching/ranking effect if you are
looking at, say, span queries across the multiple fields.
Try it out and see what ha
the
same name?
The other question is if scoring of results differ between the use of a
single field vs multiple fields of the same name?
For results ranking, I am guessing there is an effect based on
<https://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_search_over_multiple_fields.3F&g
Hi,
I know it is recommended to disable the coordination factor when using models
other than default TFIDFSimilarity. And out of curiosity i'd like to know the
motivation behind it but it is not explained anywhere, not even in
LUCENE-2959, the patches, wiki, PDF's or whatever. So, anyone here
;, "t"));
> > indexSearcher.search(prefixQuery, prefixFilter, collector);
> >
> > This returns about 5000 hits on my index.
> >
> > But then I discovered that it works just as well without the filter:
> >
> > QueryParser queryParser = new QueryParser(Ver
);
>
> Why, I don't know. Seems like this would get expanded out into 5000
> BooleanQueries and since my max clause count is still set to the default 1024
> I should get the exception. But I didn't. So maybe I don't need the filter
> after all?
>
> Next, I need s
refixQuery, collector);
>
> Why, I don't know. Seems like this would get expanded out into 5000
> BooleanQueries and since my max clause count is still set to the default 1024
> I should get the exception. But I didn't. So maybe I don't need the filter
> after all
count is still set to the default 1024 I
should get the exception. But I didn't. So maybe I don't need the filter
after all?
Next, I need scoring to work. I read that with wildcard queries all scores are
set to 1.0 by default. But I read you can use the
QueryParser.setMultiTe
Hi,
TF-IDF is just the default (and fast) scoring scheme. You can modify that (the
"Similarity") as you want (since Lucene 4.0):
http://lucene.apache.org/core/4_3_1/core/org/apache/lucene/search/similarities/package-summary.html
There are already various other ones available, like
Hi,
In the Lucene docs it mentions that Lucene impements a tf-idf weighting
scheme for scoring. Is there anyway to modfiy Lucene to implement a custom
weighting scheme for the VSM?
Thank you.
ZP
>
> P.S: Instead of creating a new question, I used your question because I
> believe that the reason should be the same.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Scoring-function-in-LMDiri
is the order in which
> > documents are processed/scored and can that be changed? I'm guessing
> > it scores matches in whichever order they are stored in the index/on
> > disk, which means by increasing docIDs?
> >
> > I do see some out of order scoring
in which
> documents are processed/scored and can that be changed? I'm guessing
> it scores matches in whichever order they are stored in the index/on
> disk, which means by increasing docIDs?
>
> I do see some out of order scoring is possible but can one visit
> docs to sco
Hi Otis,
they are generally processed in docId order. The special case "out-of-order"
processing is only used for BooleanScorer1, in which the document IDs can be
reported to the Collector out-of-order (because BooleanScorer scores documents
in buckets). If you don’t allow out-of-ord
Hi,
When Lucene scores matching documents, what is the order in which
documents are processed/scored and can that be changed? I'm guessing
it scores matches in whichever order they are stored in the index/on
disk, which means by increasing docIDs?
I do see some out of order scoring is pos
, I used your question because I
believe that the reason should be the same.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Scoring-function-in-LMDirichletSimilarity-Class-tp4052488p4053267.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com
Hi,
Can anyone help me understand the scoring function in the LMDirichletSimilarity
class?
The scoring function in LMDirichletSimilarity is shown below:
---
float score = stats.getTotalBoost() * (float
AM, lucas van overberghe
wrote:
> Hi,
>
> We are currently using Hibernate Search but had some questions
> regarding scoring. We are implementing a quicksearchengine in our
> webapp but want to customize the scoring a bit.
>
> Let's say, you have a User named Peter, and
1 - 100 of 728 matches
Mail list logo