Try to play with the similarity class/subclasses, it might help.
For example, you may adjust the coord to increase the chance (not necessary
guarantee?) that ORed results will be after the ANDed results; adjust the
sloppy factor to favor phrases, etc.
Xiaocheng
Sajid Khan <[EMAIL PROTECTED]> wro
Hi,
We run a search engine based on Lucene 1.9.1 / Nutch 0.7.2. Our index
has approximately 2 million documents and the physical size of it is
about 10 GB. We run it as a tomcat web application on a Fedora Core 4
server with duo Xeon 3.2GHz processors and 4GB RAM.
We receive about 46500 web sear
Hi All,
Is it possible to get the scores/filed in the result document, instead of
getting scores/document?
If this feature is not exists, what are the possible ways for implementing
this feature?
Thanks,
Sunil
Thanks Grant and Erik for your suggestions. I will try both of them and
let you know if I see a marked increase in speed.
Tom
-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 07, 2006 1:24 PM
To: java-user@lucene.apache.org
Subject: Re: Readin
Well, the performance isn't bad considering you're executing the *search*
around 1,000 times...
One of the characteristics of a Hits object is that it's optimized for
getting the top 100 docs or so. To get the next 100 docs it re-executes the
query. Repeatedly . I'd try using a HitCollector o
Thanks for your help guy. I'll test that query parser.
Marcelo
On Dec 6, 2006, at 11:37 PM, Renaud Waldura wrote:
Read my own complaints about QueryParser here:
http://marc.theaimsgroup.com/?l=lucene-user&m=116069469827270&w=2
You're in for a surprise. As alluded by Erick, the stock QP doesn
Have you done any profiling to identify hotspots in Lucene versus
your application?
You might look into the FieldSelector code (used in IndexReader) in
the Trunk version of Lucene could be used to only load the fields you
are interested when getting the document from disk. This can be
us
Howdy all,
I have a question on reading many documents and time to do this.
I have a loop on the hits object reading a record, then writing it to a
file. When there is only 1 user on the Index Searcher, this process to
read say 100,000 takes around 3 seconds. This is slow, but can
Ariel Isaac Romero Cartaya wrote:
Hi every body:
I am getting a problem during the indexing process, I am indexing big
amounts of texts most of them in pdf format I am using pdf box 0.6 version.
The space in hard disk before that the indexing process begin is around 120
Gb but incredibly even