Hi, Thanks for your suggestion. I haven't yet tried your technique but I did try something similar by tweaking some Indexwriter properties like mergeFactor and minMergeDocs and it did certainly speed up the process a lot. I am sure the same can be achieved with what you suggest because it is essentially doing the same thing, namely, doing more in-memory store before writing the index out to disk.
Thanks, Aalap. --- "Peter A. Daly" <[EMAIL PROTECTED]> wrote: > On some systems I have seen big speed increases by > indexing to a > RAMDirectory and periodically "merging" into an on > disk directory > every X number of docs. May or may not help in this > case. In the > first case a used this, it took indexing down from a > few hours to 30 > minutes for a few million documents on a windows 2k > desktop machine. > > http://www.budget-ha.com/lucene/ram-to-disk/ > > -Pete > > On 4/20/05, Otis Gospodnetic > <[EMAIL PROTECTED]> wrote: > > That sounds way too long, unless you have veeery > slow disks, veeery > > large Documents (long fields that you analyze, > index, and store in > > Lucene), or some such. > > If you have very loooong fiiiiieeeelds you could > try setting > > > http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength > > to a very small number and see if that changes > performance drastically. > > There are other IndexWriter knobs you can fiddle > with. > > > > I've seen Hibernate 2.* get sluggish once its > Session gets filled up > > with a lot of objects. > > > > Otis > > > > > > --- Aalap Parikh <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > I have similar issues in indexing time. > > > > > > I am doing a SELECT from database and getting > back > > > 10,000 rows. I then start indexing each row and > hence > > > would have 10,000 documents in my Lucene index. > Each > > > doc has 27 fields. > > > > > > I added some timing code to my indexing process. > The > > > DB select call takes around 23 seconds and the > > > indexing process takes 567 seconds. Also, I > profiled > > > the app using JProfiler and found out that 90% > of time > > > is spent in the IndexWriter.addDocument call. As > > > expected, there were 10,000 invocation of that > method > > > (one for each doc) and the profiler showed that > the > > > method took 90% of the processing time. > > > > > > I am concerned that it is taking around 9.5 > minutes > > > for 10,000 docs and I am expecting to have > around > > > 600,000 docs to index. So that would take 570 > minutes > > > (9-10 hours) to index and which is HUGE!!! > > > > > > My machine: Pentium 4 CPU 2.40 GHz > > > RAM 1 GB > > > > > > Any help appreciated. > > > > > > Thanks, > > > Aalap. > > > > > > > > > --- [EMAIL PROTECTED] wrote: > > > > В сообщении от Среда 20 > > > > Апрель 2005 04:07 Mufaddal Khumri > > > > написал(a): > > > > > The 20000 products I mentioned are 20000 > rows. I > > > > get the products in > > > > > bulk by using a limit clause. > > > > > > > > > > I am using hibernate with MySQL server on a > > > > 2.8GHz, 1.00GB Ram machine. > > > > > > > > Maybe your session-level cache in hibernate > grows > > > > incredibly. Do you do > > > > Session.clear() sometimes while doing > indexing? > > > > Here's a link about batching > > > > & hibernate: > > > > > > > > http://blog.hibernate.org/cgi-bin/blosxom.cgi/2004/08/ > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: > > > > [EMAIL PROTECTED] > > > > For additional commands, e-mail: > > > > [EMAIL PROTECTED] > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]