Re: Lucene bulk indexing

2005-04-22 Thread Aalap Parikh
Hi Peter, As I said in my earlier email, changing the mergeFactor and minMergeDocs properties in IndexWriter did help but still not what I would like it to be. I then tried what you suggested. RAMDirectory-based disk indexing and it has worked SUPERBLY for me. I was able to reduce the processing t

Re: Lucene bulk indexing

2005-04-22 Thread Aalap Parikh
Hi, > : the app using JProfiler and found out that 90% of > time > : is spent in the IndexWriter.addDocument call. As > > what analyzer are you using? I am using the StandardAnalyzer (tried using SimpleAnalyzer too, but not much affect on performance). > : My machine: Pentium 4 CPU 2.40 GHz > :

Increase IndexWriter.mergeFactor if you have enought memory Re: Lucene bulk indexing

2005-04-21 Thread Che Dong
Hi all: did you tried to increase IndexWriter.mergeFactor. I tried to increase it to 1000 and index speed is about 10 time faster than defualt = 10 . Regards Che Dong http://www.chedong.com/ Aalap Parikh åé: My machine is pretty good and fairly new. The disk for sure is not slow and also I am not

Re: Lucene bulk indexing

2005-04-21 Thread Chris Hostetter
: the app using JProfiler and found out that 90% of time : is spent in the IndexWriter.addDocument call. As what analyzer are you using? : My machine: Pentium 4 CPU 2.40 GHz : RAM 1 GB what JVM args are you using? (in particular: how much ram are you telling the JVM to use) ... what

Re: Lucene bulk indexing

2005-04-21 Thread Aalap Parikh
Hi, Thanks for your suggestion. I haven't yet tried your technique but I did try something similar by tweaking some Indexwriter properties like mergeFactor and minMergeDocs and it did certainly speed up the process a lot. I am sure the same can be achieved with what you suggest because it is essen

Re: Lucene bulk indexing

2005-04-21 Thread Aalap Parikh
My machine is pretty good and fairly new. The disk for sure is not slow and also I am not indexing large Documents; 27 fields with each field value being a string with no more than 15-20 characters long. I tried setting the maxFieldLength value of the Indexwriter to a low value but that didn't hel

Re: Lucene bulk indexing

2005-04-21 Thread Peter A. Daly
On some systems I have seen big speed increases by indexing to a RAMDirectory and periodically "merging" into an on disk directory every X number of docs. May or may not help in this case. In the first case a used this, it took indexing down from a few hours to 30 minutes for a few million docume

Re: Lucene bulk indexing

2005-04-20 Thread Otis Gospodnetic
That sounds way too long, unless you have veeery slow disks, veeery large Documents (long fields that you analyze, index, and store in Lucene), or some such. If you have very lng filds you could try setting http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#

Re: Lucene bulk indexing

2005-04-20 Thread Aalap Parikh
Hi, I have similar issues in indexing time. I am doing a SELECT from database and getting back 10,000 rows. I then start indexing each row and hence would have 10,000 documents in my Lucene index. Each doc has 27 fields. I added some timing code to my indexing process. The DB select call takes a

Re: Lucene bulk indexing

2005-04-20 Thread Volodymyr Bychkoviak
Hi, The best way to determine bottlenecks is profiling. (JProfiler is very good tool for that. It's commercial product with free evaluation) I was indexing 1.5 million documents in 45 minutes. before optimizing it took much more time to index. optimization was done through 'select' query changin

Re: Lucene bulk indexing

2005-04-19 Thread skoptelov
Ð ÑÐÐÐÑ ÐÑ ÐÑÐÐÐ 20 ÐÐÑÐÐÑ 2005 04:07 Mufaddal Khumri ÑÐÐ(a): > The 2 products I mentioned are 2 rows. I get the products in > bulk by using a limit clause. > > I am using hibernate with MySQL server on a 2.8GHz, 1.00GB Ram machine. Maybe your session-level cache in hibernate grow

Re: Lucene bulk indexing

2005-04-19 Thread Chris Lamprecht
> > Where could I be slowing down the indexing process? Is it because I am > indexing the longDescription as a Field.Text? (longDescription could > have a fair amount of text averaging about 4000 words). > > Any ideas? > > Thanks, > Mufaddal. > > -Original Message

RE: Lucene bulk indexing

2005-04-19 Thread Mufaddal Khumri
wing down the indexing process? Is it because I am indexing the longDescription as a Field.Text? (longDescription could have a fair amount of text averaging about 4000 words). Any ideas? Thanks, Mufaddal. -Original Message- From: Daniel Herlitz [mailto:[EMAIL PROTECTED] Sent: Tuesday, A

Re: Lucene bulk indexing

2005-04-19 Thread Daniel Herlitz
Agree. We run an index with about 2.5 million documents and around 30 fields. The indexing itself of 2 items should only take a few seconds on a reasonably fast machine. /D Kevin L. Cobb wrote: I think your bottleneck is most likely the DB hit. I assume by 2 products you mean 2 disti

RE: Lucene bulk indexing

2005-04-19 Thread Kevin L. Cobb
I think your bottleneck is most likely the DB hit. I assume by 2 products you mean 2 distinct entries into the Lucene Index, i.e. 2 rows in the DB to select from. I index about 1.5 million rows from a SQL Server 2000 database with several fields for each entry and it finishes in about