Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
m/ -- Solr - Lucene - Nutch > > > > > > From: Jason Rutherglen > To: java-user@lucene.apache.org > Sent: Wed, January 13, 2010 5:54:38 PM > Subject: Re: Max Segmentation Size when Optimizing Index > > Yes... You could hack LogMergePolicy to do something else.

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
__ > From: Jason Rutherglen > To: java-user@lucene.apache.org > Sent: Wed, January 13, 2010 5:54:38 PM > Subject: Re: Max Segmentation Size when Optimizing Index > > Yes... You could hack LogMergePolicy to do something else. > > I use optimise(numse

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Otis Gospodnetic
I think Jason meant "15-20GB segments"? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch From: Jason Rutherglen To: java-user@lucene.apache.org Sent: Wed, January 13, 2010 5:54:38 PM Subject: Re: Max Segmentation Size when Optimi

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
Yes... You could hack LogMergePolicy to do something else. I use optimise(numsegments:5) regularly on 80GB indexes, that if optimized to 1 segment, would thrash the IO excessively. This works fine because 15-20GB indexes are plenty large and fast. On Wed, Jan 13, 2010 at 2:44 PM, Trin Chavalittu

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Trin Chavalittumrong
Seems like optimize() only cares about final number of segments rather than the size of the segment. Is it so? On Wed, Jan 13, 2010 at 2:35 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > There's a different method in LogMergePolicy that performs the > optimize... Right, so normal mer

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
There's a different method in LogMergePolicy that performs the optimize... Right, so normal merging uses the findMerges method, then there's a findMergeOptimize (method names could be inaccurate). On Wed, Jan 13, 2010 at 2:29 PM, Trin Chavalittumrong wrote: > Do you mean MergePolicy is only used

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Trin Chavalittumrong
Do you mean MergePolicy is only used during index time and will be ignored by by the Optimize() process? On Wed, Jan 13, 2010 at 1:57 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Oh ok, you're asking about optimizing... I think that's a different > algorithm inside LogMergePolicy.

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
Oh ok, you're asking about optimizing... I think that's a different algorithm inside LogMergePolicy. I think it ignores the maxMergeMB param. On Wed, Jan 13, 2010 at 1:49 PM, Trin Chavalittumrong wrote: > Thanks, Jason. > > Is my understanding correct that LogByteSizeMergePolicy.setMaxMergeMB(10

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Trin Chavalittumrong
Thanks, Jason. Is my understanding correct that LogByteSizeMergePolicy.setMaxMergeMB(100) will prevent merging of two segments that is larger than 100 Mb each at the optimizing time? If so, why do think would I still see segment that is larger than 200 MB? On Wed, Jan 13, 2010 at 1:43 PM, Jaso

Re: Max Segmentation Size when Optimizing Index

2010-01-13 Thread Jason Rutherglen
Hi Trin, There was recently a discussion about this, the max size is for the before merge segments, rather than the resultant merged segment (if that makes sense). It'd be great if we had a merge policy that limited the resultant merged segment, though that'd by a rough approximation at best. Jas

Max Segmentation Size when Optimizing Index

2010-01-13 Thread Trin Chavalittumrong
Hi, I am trying to optimize the index which would merge different segment together. Let say the index folder is 1Gb in total, I need each segmentation to be no larger than 200Mb. I tried to use *LogByteSizeMergePolicy *and setMaxMergeMB(100) to ensure no segment after merging would be 200Mb. How