On some systems I have seen big speed increases by indexing to a RAMDirectory and periodically "merging" into an on disk directory every X number of docs. May or may not help in this case. In the first case a used this, it took indexing down from a few hours to 30 minutes for a few million documents on a windows 2k desktop machine.
http://www.budget-ha.com/lucene/ram-to-disk/ -Pete On 4/20/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > That sounds way too long, unless you have veeery slow disks, veeery > large Documents (long fields that you analyze, index, and store in > Lucene), or some such. > If you have very loooong fiiiiieeeelds you could try setting > http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength > to a very small number and see if that changes performance drastically. > There are other IndexWriter knobs you can fiddle with. > > I've seen Hibernate 2.* get sluggish once its Session gets filled up > with a lot of objects. > > Otis > > > --- Aalap Parikh <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I have similar issues in indexing time. > > > > I am doing a SELECT from database and getting back > > 10,000 rows. I then start indexing each row and hence > > would have 10,000 documents in my Lucene index. Each > > doc has 27 fields. > > > > I added some timing code to my indexing process. The > > DB select call takes around 23 seconds and the > > indexing process takes 567 seconds. Also, I profiled > > the app using JProfiler and found out that 90% of time > > is spent in the IndexWriter.addDocument call. As > > expected, there were 10,000 invocation of that method > > (one for each doc) and the profiler showed that the > > method took 90% of the processing time. > > > > I am concerned that it is taking around 9.5 minutes > > for 10,000 docs and I am expecting to have around > > 600,000 docs to index. So that would take 570 minutes > > (9-10 hours) to index and which is HUGE!!! > > > > My machine: Pentium 4 CPU 2.40 GHz > > RAM 1 GB > > > > Any help appreciated. > > > > Thanks, > > Aalap. > > > > > > --- [EMAIL PROTECTED] wrote: > > > В сообщении от Среда 20 > > > Апрель 2005 04:07 Mufaddal Khumri > > > написал(a): > > > > The 20000 products I mentioned are 20000 rows. I > > > get the products in > > > > bulk by using a limit clause. > > > > > > > > I am using hibernate with MySQL server on a > > > 2.8GHz, 1.00GB Ram machine. > > > > > > Maybe your session-level cache in hibernate grows > > > incredibly. Do you do > > > Session.clear() sometimes while doing indexing? > > > Here's a link about batching > > > & hibernate: > > > > > http://blog.hibernate.org/cgi-bin/blosxom.cgi/2004/08/ > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: > > > [EMAIL PROTECTED] > > > For additional commands, e-mail: > > > [EMAIL PROTECTED] > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >