Re: Optimizing index takes too long

2007-11-12 Thread Lucene User
what type of documents are indexing regards gaurav On 11/11/07, Barry Forrest <[EMAIL PROTECTED]> wrote: > > Hi, > > Optimizing my index of 1.5 million documents takes days and days. > > I have a collection of 10 million documents that I am trying to index > with Lucene. I've divided the colle

Re: Optimizing index takes too long

2007-11-12 Thread Barry Forrest
On Nov 12, 2007 1:15 PM, J.J. Larrea <[EMAIL PROTECTED]> wrote: > > 2. Since the full document and its longer bibliographic subfields are > being indexed but not stored, my guess is that the large size of the index > segments is due to the inverted index rather than the stored data fields. > But

Re: Optimizing index takes too long

2007-11-12 Thread Eric Louvard
You could have a look at this thread. http://www.gossamer-threads.com/lists/lucene/java-user/29354 regards. Barry Forrest schrieb: > Hi, > > Optimizing my index of 1.5 million documents takes days and days. > > I have a collection of 10 million documents that I am trying to index > with Lucene.

Re: Optimizing index takes too long

2007-11-12 Thread Michael McCandless
> I am using the 2.3-dev version only because LUCENE-843 suggested > that this might be a path to faster indexing. I started out using > 2.2 and can easily go back. I am using default MergePolicy and > MergeScheduler. Did you note any indexing or optimize speed differences between 2.2 & 2.3-dev?

Re: Optimizing index takes too long

2007-11-11 Thread Barry Forrest
Thanks very much for all your suggestions. I will work through these to see what works. Appreciate that indexing takes many hours, so it will take me a few days. Working with a subset isn't really indicative, since the problems only manifest with larger indexes. (Note that this might be a solut

Re: Optimizing index takes too long

2007-11-11 Thread Grant Ingersoll
Not sure the numbers are off w/ documents that big, although I imagine you are hitting the token limit w/ docs that big. Is this all on one machine as you described, or are you saying you have a couple of these? If one, have you tried having just one index? Since you are using 2.3 (note t

Re: Optimizing index takes too long

2007-11-11 Thread J.J. Larrea
Hi. Here are a couple of thoughts: 1. Your problem description would be a little easier to parse if you didn't use the word "stored" to refer to fields which are not, in a Lucene sense, stored, only indexed. For example, one doesn't "store" stemmed and unstemmed versions, since stemming has ab

Re: Optimizing index takes too long

2007-11-11 Thread Mark Miller
For a start, I would lower the merge factor quite a bit. A high merge factor is over rated :) You will build the index faster, but searches will be slower and an optimize takes much longer. Essentially, the time you save when indexing is paid when optimizing anyway. You might as well amortize t

Re: Optimizing index takes too long

2007-11-11 Thread Barry Forrest
Hi, Thanks for your help. I'm using Lucene 2.3. Raw document size is about 138G for 1.5M documents, which is about 250k per document. IndexWriter settings are MergeFactor 50, MaxMergeDocs 2000, RAMBufferSizeMB 32, MaxFieldLength Integer.MAX_VALUE. Each document has about 10 short bibliographic

Re: Optimizing index takes too long

2007-11-11 Thread Grant Ingersoll
Hmmm, something doesn't sound quite right. You have 10 million docs, split into 5 or so indexes, right? And each sub index is 150 gigabytes? How big are your documents? Can you provide more info about what your Directory and IndexWriter settings are? What version of Lucene are you using

Optimizing index takes too long

2007-11-11 Thread Barry Forrest
Hi, Optimizing my index of 1.5 million documents takes days and days. I have a collection of 10 million documents that I am trying to index with Lucene. I've divided the collection into chunks of about 1.5 - 2 million documents each. Indexing 1.5 documents is fast enough (about 12 hours), but t