Re: Using IndexReader in the web environment

2010-05-05 Thread Ivan Liu
You may look this: private static IndexSearcher indexSearcher = null; public synchronized IndexSearcher newIndexSearcher() { try { if (null == indexSearcher) { Directory directory = FSDirectory.open(new File(Config.DB_DIR+"/rssindex")); indexSearcher = new IndexSearcher(IndexReader.

Re: How can I merge .cfx and .cfs into a single cfs file?

2010-05-05 Thread 张志田
Thank you Mike. Garry - Original Message - From: "Michael McCandless" To: Sent: Wednesday, May 05, 2010 8:24 PM Subject: Re: How can I merge .cfx and .cfs into a single cfs file? Lucene considers an index with a single .cfx and a single .cfs as optimized. Also, note that how Lucene s

Re: problem in Lucene's ranking function

2010-05-05 Thread Yonik Seeley
2010/5/5 José Ramón Pérez Agüera : [...] > The consequence is that a document > matching a single query term over several fields could score much > higher than a document matching several query terms in one field only, One partial workaround that people use is DisjunctionMaxQuery (used by "dismax"

Re: problem in Lucene's ranking function

2010-05-05 Thread José Ramón Pérez Agüera
Hi Robert, I will be very happy to see this problem fixed :-) I can not image what reasons people have to use software with bugs, I guess that others bugs in lucene are removed. Anyway, if finally you are going to fix the problem, these are good news :-) thank you very much for your time. jose O

Re: problem in Lucene's ranking function

2010-05-05 Thread Robert Muir
2010/5/5 José Ramón Pérez Agüera > Hi Robert, > > the problem is not the linear combination of fields, the problem is to > apply the boost factor per field after the term frequency saturation > function and then make the linear combination of fields. Every system > that implement BM25F, including

Re: problem in Lucene's ranking function

2010-05-05 Thread José Ramón Pérez Agüera
Hi Robert, the problem is not the linear combination of fields, the problem is to apply the boost factor per field after the term frequency saturation function and then make the linear combination of fields. Every system that implement BM25F, including terrier, take care of that, because if you do

Re: problem in Lucene's ranking function

2010-05-05 Thread Robert Muir
2010/5/5 José Ramón Pérez Agüera > Hi Robert, > > thank you very much for your quick response, I have a couple of questions, > > did you read the papers that I mention in my e-mail? > Yes. > do you think that Lucene ranking function could have this problem? > > I know it does. > My concern i

Re: problem in Lucene's ranking function

2010-05-05 Thread José Ramón Pérez Agüera
Hi Robert, thank you very much for your quick response, I have a couple of questions, did you read the papers that I mention in my e-mail? do you think that Lucene ranking function could have this problem? My concern is not about how to implement different kind of ranking functions for Lucene, I

Re: problem in Lucene's ranking function

2010-05-05 Thread Robert Muir
José, you might want to watch LUCENE-2392. In this issue, we are proposing adding additional flexibility to the scoring mechanism including: * controlling scoring on a per-field basis * the ability to compute and use aggregate statistics (average field length, total TF across all docs) * fine-grai

problem in Lucene's ranking function

2010-05-05 Thread José Ramón Pérez Agüera
Hi all, We realize that there is a bug in Lucene's ranking function. Most ranking functions, use a non-linear method to saturate the computation of the frequencies. This is due to the fact that the information gained on observing a term the first time is greater than the information gained on subs

Re: Relevancy Practices

2010-05-05 Thread Avi Rosenschein
On Wed, May 5, 2010 at 5:08 PM, Grant Ingersoll wrote: > > On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: > > > On 4/30/10, Grant Ingersoll wrote: > >> > >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: > >>> Also, tuning the algorithms to the users can be very important. For > >>> ins

Re: Relevancy Practices

2010-05-05 Thread Peter Keegan
The feedback came directly from customers and customer facing support folks. Here is an example of a query with keywords: nurse, rn, nursing, hospital. The top 2 hits have scores of 26.86348 and 26.407215. To the customer, both results were equally relevant because all of their keywords were in the

Re: Relevancy Practices

2010-05-05 Thread Grant Ingersoll
Thanks, Peter. Can you share what kind of evaluations you did to determine that the end user believed the results were equally relevant? How formal was that process? -Grant On May 3, 2010, at 11:08 AM, Peter Keegan wrote: > We discovered very soon after going to production that Lucene's score

Re: Relevancy Practices

2010-05-05 Thread Grant Ingersoll
On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: > On 4/30/10, Grant Ingersoll wrote: >> >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: >>> Also, tuning the algorithms to the users can be very important. For >>> instance, we have found that in a basic search functionality, the default

Re: How can I merge .cfx and .cfs into a single cfs file?

2010-05-05 Thread Michael McCandless
Lucene considers an index with a single .cfx and a single .cfs as optimized. Also, note that how Lucene stores files in the index is an impl detail -- it can change from release to release -- so relying on any of these details is dangerous. That said, with recent Lucene versions, if you really wa

Re: Using IndexReader in the web environment

2010-05-05 Thread Ian Lea
You could tell the searching part of your app, via some notification or messaging call. Or call IndexReader.isCurrent() from time to time, or even on every search, and reopen() if necessary. See the javadocs and don't forget to close the old reader when you do call reopen. -- Ian. On Wed, May

Re: How can I merge .cfx and .cfs into a single cfs file?

2010-05-05 Thread 张志田
Uwe, thank you very much. What is the mechanizm lucene will merge these two kinds of files? Sometimes I found there was only one .cfs file, but in another time there may be one cfs and cfx. I understand the .cfx is used to store the term vectors etc, but why does the index result not seem to be

AW: How can I merge .cfx and .cfs into a single cfs file?

2010-05-05 Thread Uwe Goetzke
Index all into a directory and determine the size of all files in it. >From http://lucene.apache.org/java/3_0_1/fileformats.html Starting with Lucene 2.3, doc store files (stored field values and term vectors) can be shared in a single set of files for more than one segment. When compound file