RE: Overriding Lucene's term weights computation

2010-06-24 Thread Yuval Feinstein
To: java-user@lucene.apache.org Subject: Re: Overriding Lucene's term weights computation ok, thanks Yuval. I'll take a look. Could you (or anyone) please elaborate why payloads "seem like a worse fit" ? TX, Naama On Wed, Jun 23, 2010 at 11:00 PM, Yuval Feinstein wrote: >

RE: Overriding Lucene's term weights computation

2010-06-23 Thread Yuval Feinstein
Naama, Maybe you could use the new flexible indexing mechanism. Some information is in this lecture: http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf Alternatively, you may use payloads, but they seem like a worse fit. Good Luck, Yuval _

RE: A question bout google search index?

2010-06-10 Thread Yuval Feinstein
Most of the implementation of Google's search index is kept secret by Google. Based on publicly available information, the indexes are quite different - Google uses its BigTable and MapReduce technologies to efficiently distribute the index. There are similar efforts in the Lucene ecosystem - Sol

RE: If you could have one feature in Lucene...

2010-02-24 Thread Yuval Feinstein
A pluggable scoring model that can incorporate BM25, TF/IDF and other variants of scoring. -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Wednesday, February 24, 2010 3:42 PM To: java-user@lucene.apache.org Subject: If you could have

RE: BM25 Scoring Patch

2010-02-18 Thread Yuval Feinstein
level IDF' for BM25f. - Joaquin tried to bypass this by using the IDF of the field having the longest average length instead - of the document's IDF. - This introduces some bias into the scoring formula, but maybe it is not too large... On Thu, Feb 18, 2010 at 3:45 AM, Yuval Feinstein wr

RE: BM25 Scoring Patch

2010-02-18 Thread Yuval Feinstein
We could solve this by saying we only incorporate BM25F into Lucene. This is a field-based scoring method, so it saves us the need to deal with documents. Building on Joaquin's work, the extra parts needed IMO are: a. Support for storing average length per field during indexing. I think I saw som

RE: BM25 Scoring Patch

2010-02-17 Thread Yuval Feinstein
This is very interesting and much friendlier than a flame war. My practical question for Robert is: How can we modify the BM25 patch so that it: a) Becomes part of Lucene contrib. b) Be easier to use (preventing mistakes such as Ivan's using the BM25 similarity during indexing). c) Proceeds towar

RE: Do deleted documents affect scores?

2010-02-11 Thread Yuval Feinstein
Thanks Ian and Andrzej. You solved a mystery for us. -- Yuval From: Andrzej Bialecki [...@getopt.org] Sent: Thursday, February 11, 2010 6:53 PM To: java-user@lucene.apache.org Subject: Re: Do deleted documents affect scores? On 2010-02-11 17:35, Ian Lea wr

Do deleted documents affect scores?

2010-02-10 Thread Yuval Feinstein
I want to focus my previous question. Say we have two Lucene indexes: A and B. Index A contains documents a and b. Index B used to contain documents a, b and c, But c was deleted. All documents share some vocabulary. If we search using terms common to documents b and c, Can we get a different score

RE: Different replicas return different scores

2010-02-09 Thread Yuval Feinstein
2010 at 2:26 PM, Yuval Feinstein wrote: > We are running a large sharded Lucene-based application. > Our configuration supports near real-time updates, by incrementally > Updating documents (using delete then add) on the shards. > Every shard is replicated to several machines in order t

Different replicas return different scores

2010-02-09 Thread Yuval Feinstein
We are running a large sharded Lucene-based application. Our configuration supports near real-time updates, by incrementally Updating documents (using delete then add) on the shards. Every shard is replicated to several machines in order to improve performance. We replicate the shard by sending the