m/ -- Solr - Lucene - Nutch
>
>
>
>
>
> From: Jason Rutherglen
> To: java-user@lucene.apache.org
> Sent: Wed, January 13, 2010 5:54:38 PM
> Subject: Re: Max Segmentation Size when Optimizing Index
>
> Yes... You could hack LogMergePolicy to do something else.
__
> From: Jason Rutherglen
> To: java-user@lucene.apache.org
> Sent: Wed, January 13, 2010 5:54:38 PM
> Subject: Re: Max Segmentation Size when Optimizing Index
>
> Yes... You could hack LogMergePolicy to do something else.
>
> I use optimise(numse
I think Jason meant "15-20GB segments"?
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
From: Jason Rutherglen
To: java-user@lucene.apache.org
Sent: Wed, January 13, 2010 5:54:38 PM
Subject: Re: Max Segmentation Size when Optimi
Yes... You could hack LogMergePolicy to do something else.
I use optimise(numsegments:5) regularly on 80GB indexes, that if
optimized to 1 segment, would thrash the IO excessively. This works
fine because 15-20GB indexes are plenty large and fast.
On Wed, Jan 13, 2010 at 2:44 PM, Trin Chavalittu
Seems like optimize() only cares about final number of segments rather than
the size of the segment. Is it so?
On Wed, Jan 13, 2010 at 2:35 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
> There's a different method in LogMergePolicy that performs the
> optimize... Right, so normal mer
There's a different method in LogMergePolicy that performs the
optimize... Right, so normal merging uses the findMerges method, then
there's a findMergeOptimize (method names could be inaccurate).
On Wed, Jan 13, 2010 at 2:29 PM, Trin Chavalittumrong wrote:
> Do you mean MergePolicy is only used
Do you mean MergePolicy is only used during index time and will be ignored
by by the Optimize() process?
On Wed, Jan 13, 2010 at 1:57 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
> Oh ok, you're asking about optimizing... I think that's a different
> algorithm inside LogMergePolicy.
Oh ok, you're asking about optimizing... I think that's a different
algorithm inside LogMergePolicy. I think it ignores the maxMergeMB
param.
On Wed, Jan 13, 2010 at 1:49 PM, Trin Chavalittumrong wrote:
> Thanks, Jason.
>
> Is my understanding correct that LogByteSizeMergePolicy.setMaxMergeMB(10
Thanks, Jason.
Is my understanding correct that LogByteSizeMergePolicy.setMaxMergeMB(100)
will prevent
merging of two segments that is larger than 100 Mb each at the optimizing
time?
If so, why do think would I still see segment that is larger than 200 MB?
On Wed, Jan 13, 2010 at 1:43 PM, Jaso
Hi Trin,
There was recently a discussion about this, the max size is
for the before merge segments, rather than the resultant merged
segment (if that makes sense). It'd be great if we had a merge
policy that limited the resultant merged segment, though that'd
by a rough approximation at best.
Jas
Hi,
I am trying to optimize the index which would merge different segment
together. Let say the index folder is 1Gb in total, I need each segmentation
to be no larger than 200Mb. I tried to use *LogByteSizeMergePolicy *and
setMaxMergeMB(100) to ensure no segment after merging would be 200Mb.
How
what type of documents are indexing
regards
gaurav
On 11/11/07, Barry Forrest <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> Optimizing my index of 1.5 million documents takes days and days.
>
> I have a collection of 10 million documents that I am trying to index
> with Lucene. I've divided the colle
On Nov 12, 2007 1:15 PM, J.J. Larrea <[EMAIL PROTECTED]> wrote:
>
> 2. Since the full document and its longer bibliographic subfields are
> being indexed but not stored, my guess is that the large size of the index
> segments is due to the inverted index rather than the stored data fields.
> But
You could have a look at this thread.
http://www.gossamer-threads.com/lists/lucene/java-user/29354
regards.
Barry Forrest schrieb:
> Hi,
>
> Optimizing my index of 1.5 million documents takes days and days.
>
> I have a collection of 10 million documents that I am trying to index
> with Lucene.
> I am using the 2.3-dev version only because LUCENE-843 suggested
> that this might be a path to faster indexing. I started out using
> 2.2 and can easily go back. I am using default MergePolicy and
> MergeScheduler.
Did you note any indexing or optimize speed differences between 2.2 &
2.3-dev?
Thanks very much for all your suggestions.
I will work through these to see what works. Appreciate that indexing takes
many hours, so it will take me a few days. Working with a subset isn't
really indicative, since the problems only manifest with larger indexes.
(Note that this might be a solut
Not sure the numbers are off w/ documents that big, although I imagine
you are hitting the token limit w/ docs that big. Is this all on one
machine as you described, or are you saying you have a couple of
these? If one, have you tried having just one index?
Since you are using 2.3 (note t
Hi. Here are a couple of thoughts:
1. Your problem description would be a little easier to parse if you didn't use
the word "stored" to refer to fields which are not, in a Lucene sense, stored,
only indexed. For example, one doesn't "store" stemmed and unstemmed versions,
since stemming has ab
For a start, I would lower the merge factor quite a bit. A high merge
factor is over rated :) You will build the index faster, but searches
will be slower and an optimize takes much longer. Essentially, the time
you save when indexing is paid when optimizing anyway. You might as well
amortize t
Hi,
Thanks for your help.
I'm using Lucene 2.3.
Raw document size is about 138G for 1.5M documents, which is about
250k per document.
IndexWriter settings are MergeFactor 50, MaxMergeDocs 2000,
RAMBufferSizeMB 32, MaxFieldLength Integer.MAX_VALUE.
Each document has about 10 short bibliographic
Hmmm, something doesn't sound quite right. You have 10 million docs,
split into 5 or so indexes, right? And each sub index is 150
gigabytes? How big are your documents?
Can you provide more info about what your Directory and IndexWriter
settings are? What version of Lucene are you using
Hi,
Optimizing my index of 1.5 million documents takes days and days.
I have a collection of 10 million documents that I am trying to index
with Lucene. I've divided the collection into chunks of about 1.5 - 2
million documents each. Indexing 1.5 documents is fast enough (about
12 hours), but t
@lucene.apache.org
Subject: Re: Optimizing Index
yes I do have around 75 GB of free space on that HDD...I do not invoke any
index reader...hence the program only calls indexwriter to optimize the
index,and that's it..
I am also perplexed why it tells that it have not enough disk space to do
optimiz
yes I do have around 75 GB of free space on that HDD...I do not invoke any
index reader...hence the program only calls indexwriter to optimize the
index,and that's it..
I am also perplexed why it tells that it have not enough disk space to do
optimization...
Michael McCandless <[EMAIL
"maureen tanuwidjaja" wrote:
> I had an exsisting index file with the size 20.6 GB...I havent done any
> optimization in this index yet.Now I had a HDD of 100 GB,but apparently
> when I create program to optimize(which simply calls writer.optimize()
> to this indexfile),it gives the error
Hi,
I had an exsisting index file with the size 20.6 GB...I havent done any
optimization in this index yet.Now I had a HDD of 100 GB,but apparently when I
create program to optimize(which simply calls writer.optimize() to this
indexfile),it gives the error that there is not enough space on
maureen tanuwidjaja wrote:
I would like to know about optimizing index...
The exception is hit due to disk full while optimizing the index and hence,the index has not been closed yet.
Is the unclosed index dangerous?Can i perform searching in such index correctly?Is the index built
Hi ,
I would like to know about optimizing index...
The exception is hit due to disk full while optimizing the index and
hence,the index has not been closed yet.
Is the unclosed index dangerous?Can i perform searching in such index
correctly?Is the index built robust yet
Actually, you should probably not let your index grow beyond one-third the size
of your disk.
a] You start of with your original index
b] During optimize, Lucene will initially write out files in non-compound file
format.
c] Lucene will than combine the non-compound file format into the compoun
Hi all,
I have a problem related to index size and deleting and optimizing. So
from reading various sources online, it seems as though the size of the
Lucene index should become no larger than half the size of the disk
since during optimization, the size of the index can ballon to double
the origi
30 matches
Mail list logo