You may look this:
private static IndexSearcher indexSearcher = null;
public synchronized IndexSearcher newIndexSearcher() {
try {
if (null == indexSearcher) {
Directory directory = FSDirectory.open(new
File(Config.DB_DIR+"/rssindex"));
indexSearcher = new IndexSearcher(IndexReader.
Thank you Mike.
Garry
- Original Message -
From: "Michael McCandless"
To:
Sent: Wednesday, May 05, 2010 8:24 PM
Subject: Re: How can I merge .cfx and .cfs into a single cfs file?
Lucene considers an index with a single .cfx and a single .cfs as optimized.
Also, note that how Lucene s
2010/5/5 José Ramón Pérez Agüera :
[...]
> The consequence is that a document
> matching a single query term over several fields could score much
> higher than a document matching several query terms in one field only,
One partial workaround that people use is DisjunctionMaxQuery (used by
"dismax"
Hi Robert,
I will be very happy to see this problem fixed :-) I can not image
what reasons people have to use software with bugs, I guess that
others bugs in lucene are removed. Anyway, if finally you are going to
fix the problem, these are good news :-) thank you very much for your
time.
jose
O
2010/5/5 José Ramón Pérez Agüera
> Hi Robert,
>
> the problem is not the linear combination of fields, the problem is to
> apply the boost factor per field after the term frequency saturation
> function and then make the linear combination of fields. Every system
> that implement BM25F, including
Hi Robert,
the problem is not the linear combination of fields, the problem is to
apply the boost factor per field after the term frequency saturation
function and then make the linear combination of fields. Every system
that implement BM25F, including terrier, take care of that, because if
you do
2010/5/5 José Ramón Pérez Agüera
> Hi Robert,
>
> thank you very much for your quick response, I have a couple of questions,
>
> did you read the papers that I mention in my e-mail?
>
Yes.
> do you think that Lucene ranking function could have this problem?
>
>
I know it does.
> My concern i
Hi Robert,
thank you very much for your quick response, I have a couple of questions,
did you read the papers that I mention in my e-mail?
do you think that Lucene ranking function could have this problem?
My concern is not about how to implement different kind of ranking
functions for Lucene, I
José, you might want to watch LUCENE-2392.
In this issue, we are proposing adding additional flexibility to the scoring
mechanism including:
* controlling scoring on a per-field basis
* the ability to compute and use aggregate statistics (average field length,
total TF across all docs)
* fine-grai
Hi all,
We realize that there is a bug in Lucene's ranking function. Most
ranking functions, use a non-linear method to saturate the computation
of the frequencies.
This is due to the fact that the information gained on observing a
term the first time is greater than the information gained on
subs
On Wed, May 5, 2010 at 5:08 PM, Grant Ingersoll wrote:
>
> On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote:
>
> > On 4/30/10, Grant Ingersoll wrote:
> >>
> >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote:
> >>> Also, tuning the algorithms to the users can be very important. For
> >>> ins
The feedback came directly from customers and customer facing support folks.
Here is an example of a query with keywords: nurse, rn, nursing, hospital.
The top 2 hits have scores of 26.86348 and 26.407215. To the customer, both
results were equally relevant because all of their keywords were in the
Thanks, Peter.
Can you share what kind of evaluations you did to determine that the end user
believed the results were equally relevant? How formal was that process?
-Grant
On May 3, 2010, at 11:08 AM, Peter Keegan wrote:
> We discovered very soon after going to production that Lucene's score
On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote:
> On 4/30/10, Grant Ingersoll wrote:
>>
>> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote:
>>> Also, tuning the algorithms to the users can be very important. For
>>> instance, we have found that in a basic search functionality, the default
Lucene considers an index with a single .cfx and a single .cfs as optimized.
Also, note that how Lucene stores files in the index is an impl detail
-- it can change from release to release -- so relying on any of these
details is dangerous.
That said, with recent Lucene versions, if you really wa
You could tell the searching part of your app, via some notification
or messaging call. Or call IndexReader.isCurrent() from time to time,
or even on every search, and reopen() if necessary. See the javadocs
and don't forget to close the old reader when you do call reopen.
--
Ian.
On Wed, May
Uwe, thank you very much.
What is the mechanizm lucene will merge these two kinds of files? Sometimes I
found there was only one .cfs file, but in another time there may be one cfs
and cfx. I understand the .cfx is used to store the term vectors etc, but why
does the index result not seem to be
Index all into a directory and determine the size of all files in it.
>From http://lucene.apache.org/java/3_0_1/fileformats.html
Starting with Lucene 2.3, doc store files (stored field values and term
vectors) can be shared in a single set of files for more than one segment. When
compound file
18 matches
Mail list logo