Hi Peter,
As I said in my earlier email, changing the
mergeFactor and minMergeDocs properties in IndexWriter
did help but still not what I would like it to be. I
then tried what you suggested. RAMDirectory-based disk
indexing and it has worked SUPERBLY for me. I was able
to reduce the processing t
Hi,
> : the app using JProfiler and found out that 90% of
> time
> : is spent in the IndexWriter.addDocument call. As
>
> what analyzer are you using?
I am using the StandardAnalyzer (tried using
SimpleAnalyzer too, but not much affect on
performance).
> : My machine: Pentium 4 CPU 2.40 GHz
> :
Hi all:
did you tried to increase IndexWriter.mergeFactor. I tried to increase
it to 1000 and index speed is about 10 time faster than defualt = 10 .
Regards
Che Dong
http://www.chedong.com/
Aalap Parikh åé:
My machine is pretty good and fairly new. The disk for
sure is not slow and also I am not
: the app using JProfiler and found out that 90% of time
: is spent in the IndexWriter.addDocument call. As
what analyzer are you using?
: My machine: Pentium 4 CPU 2.40 GHz
: RAM 1 GB
what JVM args are you using? (in particular: how much ram are you telling
the JVM to use) ... what
Hi,
Thanks for your suggestion. I haven't yet tried your
technique but I did try something similar by tweaking
some Indexwriter properties like mergeFactor and
minMergeDocs and it did certainly speed up the process
a lot. I am sure the same can be achieved with what
you suggest because it is essen
My machine is pretty good and fairly new. The disk for
sure is not slow and also I am not indexing large
Documents; 27 fields with each field value being a
string with no more than 15-20 characters long.
I tried setting the maxFieldLength value of the
Indexwriter to a low value but that didn't hel
On some systems I have seen big speed increases by indexing to a
RAMDirectory and periodically "merging" into an on disk directory
every X number of docs. May or may not help in this case. In the
first case a used this, it took indexing down from a few hours to 30
minutes for a few million docume
That sounds way too long, unless you have veeery slow disks, veeery
large Documents (long fields that you analyze, index, and store in
Lucene), or some such.
If you have very lng filds you could try setting
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#
Hi,
I have similar issues in indexing time.
I am doing a SELECT from database and getting back
10,000 rows. I then start indexing each row and hence
would have 10,000 documents in my Lucene index. Each
doc has 27 fields.
I added some timing code to my indexing process. The
DB select call takes a
Hi,
The best way to determine bottlenecks is profiling. (JProfiler is very
good tool for that. It's commercial product with free evaluation)
I was indexing 1.5 million documents in 45 minutes.
before optimizing it took much more time to index. optimization was done
through 'select' query changin
Ð ÑÐÐÐÑ ÐÑ ÐÑÐÐÐ 20 ÐÐÑÐÐÑ 2005 04:07 Mufaddal
Khumri ÑÐÐ(a):
> The 2 products I mentioned are 2 rows. I get the products in
> bulk by using a limit clause.
>
> I am using hibernate with MySQL server on a 2.8GHz, 1.00GB Ram machine.
Maybe your session-level cache in hibernate grow
>
> Where could I be slowing down the indexing process? Is it because I am
> indexing the longDescription as a Field.Text? (longDescription could
> have a fair amount of text averaging about 4000 words).
>
> Any ideas?
>
> Thanks,
> Mufaddal.
>
> -Original Message
wing down the indexing process? Is it because I am
indexing the longDescription as a Field.Text? (longDescription could
have a fair amount of text averaging about 4000 words).
Any ideas?
Thanks,
Mufaddal.
-Original Message-
From: Daniel Herlitz [mailto:[EMAIL PROTECTED]
Sent: Tuesday, A
Agree. We run an index with about 2.5 million documents and around 30
fields. The indexing itself of 2 items should only take a few
seconds on a reasonably fast machine.
/D
Kevin L. Cobb wrote:
I think your bottleneck is most likely the DB hit. I assume by 2
products you mean 2 disti
I think your bottleneck is most likely the DB hit. I assume by 2
products you mean 2 distinct entries into the Lucene Index, i.e.
2 rows in the DB to select from.
I index about 1.5 million rows from a SQL Server 2000 database with
several fields for each entry and it finishes in about
15 matches
Mail list logo