Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
m/ -- Solr - Lucene - Nutch > > > > > > From: Jason Rutherglen > To: java-user@lucene.apache.org > Sent: Wed, January 13, 2010 5:54:38 PM > Subject: Re: Max Segmentation Size when Optimizing Index > > Yes... You could hack LogMergePolicy to do something else.

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
__ > From: Jason Rutherglen > To: java-user@lucene.apache.org > Sent: Wed, January 13, 2010 5:54:38 PM > Subject: Re: Max Segmentation Size when Optimizing Index > > Yes... You could hack LogMergePolicy to do something else. > > I use optimise(numse

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Otis Gospodnetic
I think Jason meant "15-20GB segments"? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch From: Jason Rutherglen To: java-user@lucene.apache.org Sent: Wed, January 13, 2010 5:54:38 PM Subject: Re: Max Segmentation Size when Optimi

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
Yes... You could hack LogMergePolicy to do something else. I use optimise(numsegments:5) regularly on 80GB indexes, that if optimized to 1 segment, would thrash the IO excessively. This works fine because 15-20GB indexes are plenty large and fast. On Wed, Jan 13, 2010 at 2:44 PM, Trin Chavalittu

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Trin Chavalittumrong
Seems like optimize() only cares about final number of segments rather than the size of the segment. Is it so? On Wed, Jan 13, 2010 at 2:35 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > There's a different method in LogMergePolicy that performs the > optimize... Right, so normal mer

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
There's a different method in LogMergePolicy that performs the optimize... Right, so normal merging uses the findMerges method, then there's a findMergeOptimize (method names could be inaccurate). On Wed, Jan 13, 2010 at 2:29 PM, Trin Chavalittumrong wrote: > Do you mean MergePolicy is only used

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Trin Chavalittumrong
Do you mean MergePolicy is only used during index time and will be ignored by by the Optimize() process? On Wed, Jan 13, 2010 at 1:57 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Oh ok, you're asking about optimizing... I think that's a different > algorithm inside LogMergePolicy.

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
Oh ok, you're asking about optimizing... I think that's a different algorithm inside LogMergePolicy. I think it ignores the maxMergeMB param. On Wed, Jan 13, 2010 at 1:49 PM, Trin Chavalittumrong wrote: > Thanks, Jason. > > Is my understanding correct that LogByteSizeMergePolicy.setMaxMergeMB(10

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Trin Chavalittumrong
Thanks, Jason. Is my understanding correct that LogByteSizeMergePolicy.setMaxMergeMB(100) will prevent merging of two segments that is larger than 100 Mb each at the optimizing time? If so, why do think would I still see segment that is larger than 200 MB? On Wed, Jan 13, 2010 at 1:43 PM, Jaso

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
Hi Trin, There was recently a discussion about this, the max size is for the before merge segments, rather than the resultant merged segment (if that makes sense). It'd be great if we had a merge policy that limited the resultant merged segment, though that'd by a rough approximation at best. Jas

Max Segmentation Size when Optimizing Index

2010-01-13 Thread Trin Chavalittumrong
Hi, I am trying to optimize the index which would merge different segment together. Let say the index folder is 1Gb in total, I need each segmentation to be no larger than 200Mb. I tried to use *LogByteSizeMergePolicy *and setMaxMergeMB(100) to ensure no segment after merging would be 200Mb. How

Re: Optimizing index takes too long

2007-11-12 Thread Lucene User
what type of documents are indexing regards gaurav On 11/11/07, Barry Forrest <[EMAIL PROTECTED]> wrote: > > Hi, > > Optimizing my index of 1.5 million documents takes days and days. > > I have a collection of 10 million documents that I am trying to index > with Lucene. I've divided the colle

Re: Optimizing index takes too long

2007-11-12 Thread Barry Forrest
On Nov 12, 2007 1:15 PM, J.J. Larrea <[EMAIL PROTECTED]> wrote: > > 2. Since the full document and its longer bibliographic subfields are > being indexed but not stored, my guess is that the large size of the index > segments is due to the inverted index rather than the stored data fields. > But

Re: Optimizing index takes too long

2007-11-12 Thread Eric Louvard
You could have a look at this thread. http://www.gossamer-threads.com/lists/lucene/java-user/29354 regards. Barry Forrest schrieb: > Hi, > > Optimizing my index of 1.5 million documents takes days and days. > > I have a collection of 10 million documents that I am trying to index > with Lucene.

Re: Optimizing index takes too long

2007-11-12 Thread Michael McCandless
> I am using the 2.3-dev version only because LUCENE-843 suggested > that this might be a path to faster indexing. I started out using > 2.2 and can easily go back. I am using default MergePolicy and > MergeScheduler. Did you note any indexing or optimize speed differences between 2.2 & 2.3-dev?

Re: Optimizing index takes too long

2007-11-11 Thread Barry Forrest
Thanks very much for all your suggestions. I will work through these to see what works. Appreciate that indexing takes many hours, so it will take me a few days. Working with a subset isn't really indicative, since the problems only manifest with larger indexes. (Note that this might be a solut

Re: Optimizing index takes too long

2007-11-11 Thread Grant Ingersoll
Not sure the numbers are off w/ documents that big, although I imagine you are hitting the token limit w/ docs that big. Is this all on one machine as you described, or are you saying you have a couple of these? If one, have you tried having just one index? Since you are using 2.3 (note t

Re: Optimizing index takes too long

2007-11-11 Thread J.J. Larrea
Hi. Here are a couple of thoughts: 1. Your problem description would be a little easier to parse if you didn't use the word "stored" to refer to fields which are not, in a Lucene sense, stored, only indexed. For example, one doesn't "store" stemmed and unstemmed versions, since stemming has ab

Re: Optimizing index takes too long

2007-11-11 Thread Mark Miller
For a start, I would lower the merge factor quite a bit. A high merge factor is over rated :) You will build the index faster, but searches will be slower and an optimize takes much longer. Essentially, the time you save when indexing is paid when optimizing anyway. You might as well amortize t

Re: Optimizing index takes too long

2007-11-11 Thread Barry Forrest
Hi, Thanks for your help. I'm using Lucene 2.3. Raw document size is about 138G for 1.5M documents, which is about 250k per document. IndexWriter settings are MergeFactor 50, MaxMergeDocs 2000, RAMBufferSizeMB 32, MaxFieldLength Integer.MAX_VALUE. Each document has about 10 short bibliographic

Re: Optimizing index takes too long

2007-11-11 Thread Grant Ingersoll
Hmmm, something doesn't sound quite right. You have 10 million docs, split into 5 or so indexes, right? And each sub index is 150 gigabytes? How big are your documents? Can you provide more info about what your Directory and IndexWriter settings are? What version of Lucene are you using

Optimizing index takes too long

2007-11-11 Thread Barry Forrest
Hi, Optimizing my index of 1.5 million documents takes days and days. I have a collection of 10 million documents that I am trying to index with Lucene. I've divided the collection into chunks of about 1.5 - 2 million documents each. Indexing 1.5 documents is fast enough (about 12 hours), but t

RE: Optimizing Index

2007-02-22 Thread Damien McCarthy
@lucene.apache.org Subject: Re: Optimizing Index yes I do have around 75 GB of free space on that HDD...I do not invoke any index reader...hence the program only calls indexwriter to optimize the index,and that's it.. I am also perplexed why it tells that it have not enough disk space to do optimiz

Re: Optimizing Index

2007-02-22 Thread maureen tanuwidjaja
yes I do have around 75 GB of free space on that HDD...I do not invoke any index reader...hence the program only calls indexwriter to optimize the index,and that's it.. I am also perplexed why it tells that it have not enough disk space to do optimization... Michael McCandless <[EMAIL

Re: Optimizing Index

2007-02-22 Thread Michael McCandless
"maureen tanuwidjaja" wrote: > I had an exsisting index file with the size 20.6 GB...I havent done any > optimization in this index yet.Now I had a HDD of 100 GB,but apparently > when I create program to optimize(which simply calls writer.optimize() > to this indexfile),it gives the error

Optimizing Index

2007-02-21 Thread maureen tanuwidjaja
Hi, I had an exsisting index file with the size 20.6 GB...I havent done any optimization in this index yet.Now I had a HDD of 100 GB,but apparently when I create program to optimize(which simply calls writer.optimize() to this indexfile),it gives the error that there is not enough space on

Re: exception is hit while optimizing index

2007-02-08 Thread Michael McCandless
maureen tanuwidjaja wrote: I would like to know about optimizing index... The exception is hit due to disk full while optimizing the index and hence,the index has not been closed yet. Is the unclosed index dangerous?Can i perform searching in such index correctly?Is the index built

exception is hit while optimizing index

2007-02-07 Thread maureen tanuwidjaja
Hi , I would like to know about optimizing index... The exception is hit due to disk full while optimizing the index and hence,the index has not been closed yet. Is the unclosed index dangerous?Can i perform searching in such index correctly?Is the index built robust yet

Re: Problem with deleting and optimizing index

2005-07-24 Thread Lokesh Bajaj
Actually, you should probably not let your index grow beyond one-third the size of your disk. a] You start of with your original index b] During optimize, Lucene will initially write out files in non-compound file format. c] Lucene will than combine the non-compound file format into the compoun

Problem with deleting and optimizing index

2005-07-21 Thread Peter Kim
Hi all, I have a problem related to index size and deleting and optimizing. So from reading various sources online, it seems as though the size of the Lucene index should become no larger than half the size of the disk since during optimization, the size of the index can ballon to double the origi