To: java-user@lucene.apache.org
Subject: Re: Overriding Lucene's term weights computation
ok, thanks Yuval. I'll take a look.
Could you (or anyone) please elaborate why payloads "seem like a worse fit"
?
TX, Naama
On Wed, Jun 23, 2010 at 11:00 PM, Yuval Feinstein wrote:
>
Naama, Maybe you could use the new flexible indexing mechanism.
Some information is in this lecture:
http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf
Alternatively, you may use payloads, but they seem like a worse fit.
Good Luck,
Yuval
_
Most of the implementation of Google's search index is kept secret by Google.
Based on publicly available information, the indexes are quite different -
Google uses its BigTable and MapReduce technologies to efficiently distribute
the index.
There are similar efforts in the Lucene ecosystem - Sol
A pluggable scoring model that can incorporate BM25, TF/IDF and other variants
of scoring.
-Original Message-
From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll
Sent: Wednesday, February 24, 2010 3:42 PM
To: java-user@lucene.apache.org
Subject: If you could have
level IDF' for BM25f.
- Joaquin tried to bypass this by using the IDF of the field having the longest
average length instead
- of the document's IDF.
- This introduces some bias into the scoring formula, but maybe it is not too
large...
On Thu, Feb 18, 2010 at 3:45 AM, Yuval Feinstein wr
We could solve this by saying we only incorporate BM25F into Lucene.
This is a field-based scoring method, so it saves us the need to deal with
documents.
Building on Joaquin's work, the extra parts needed IMO are:
a. Support for storing average length per field during indexing. I think I saw
som
This is very interesting and much friendlier than a flame war.
My practical question for Robert is:
How can we modify the BM25 patch so that it:
a) Becomes part of Lucene contrib.
b) Be easier to use (preventing mistakes such as Ivan's using the BM25
similarity during indexing).
c) Proceeds towar
Thanks Ian and Andrzej.
You solved a mystery for us.
-- Yuval
From: Andrzej Bialecki [...@getopt.org]
Sent: Thursday, February 11, 2010 6:53 PM
To: java-user@lucene.apache.org
Subject: Re: Do deleted documents affect scores?
On 2010-02-11 17:35, Ian Lea wr
I want to focus my previous question.
Say we have two Lucene indexes: A and B.
Index A contains documents a and b.
Index B used to contain documents a, b and c,
But c was deleted.
All documents share some vocabulary.
If we search using terms common to documents b and c,
Can we get a different score
2010 at 2:26 PM, Yuval Feinstein wrote:
> We are running a large sharded Lucene-based application.
> Our configuration supports near real-time updates, by incrementally
> Updating documents (using delete then add) on the shards.
> Every shard is replicated to several machines in order t
We are running a large sharded Lucene-based application.
Our configuration supports near real-time updates, by incrementally
Updating documents (using delete then add) on the shards.
Every shard is replicated to several machines in order to improve performance.
We replicate the shard by sending the
11 matches
Mail list logo