Hi,
I have a large index with a field that contains a important number of
terms. I knew that searching with a term starting with a wildcard was not
a good idea; looking at WildcardTermEnum(IndexReader,Term) and
IndexReader.terms(Term) I understand better why now. I have been asked
however by m
Hi,
I have an index, for which I am missing at least 1 file after hitting a
disk full situation.
is there any way I could bypass the error I get when trying to open the
index, to salvage as many docs as I can from the other files?
thanks,
vince
java.io.FileNotFoundException: D:\_2c9kgw.cfs (T
hi, I have an index where I add and delete a lot numerous documents that
are a few kb in size.
to keep space on disk stable, I have been doing forceMergeDeletes then
forceMerge(5) regularly (after I do a big clean during the night).
I am wondering if forceMergeDeletes would be sufficient for tha
Hi,
On a 256 Gb RAM machine, we have half of our IT system running.
Part of it, are 2 lucene applications, managing each a an approximate 100 Gb
index.
These applications are used to index logging events, and every night there is a
purge, followed by a forceMergeDeletes to reclaim disk space (and
Hi,
Each night I optimize an index that contains 35 millions docs. Its takes
about 1.5 hours. For maintenance reasons, it may happen that the machine
gets rebooted. In that case, server gets a chance to gracefully shutdown,
but eventually, the reboot script will kill the processes that did not
Hi Michael,
I suppose that as you suggested, if I do a close(false) during an optimize
I am supposed to expect the following exception:
java.io.IOException: background merge hit exception: _3ud72:c33936445
_3uqhr:c126349 _3uuf8:c57041 _3v27p:c78599 _3vf2s:c111005 _3vfad:c6574
_3vrcj:c130263 _3
And do I need to do any cleanup once I catch the MergeAbortedException
(such as writer commit or rollback)?
Thanks,
Vincent
v.se...@lombardodier.com
26.01.2011 15:44
Please respond to
java-user@lucene.apache.org
To
java-user@lucene.apache.org
cc
Subject
Re: gracefully interrupti
Hi,
I did some tests with the BalancedSegmentMergePolicy, looking specifically
at the optimize. I have an index that is 70 Gb large, and contains around
35 millions documents.
I duplicated the index 4 times, and I ran 2 optimize with the default
merge policy, and 2 with the balanced policy.
He
Hi, OK so I will not bother using TieredMergePolicy for now. I will do
some more tests with the contrib balanced merge policy, playing with the
optimize(maxNumSegments) to try decreasing the optimize time (which is an
issue for us today). My index contains 35 millions documents. The size on
dis
Hi,
we developped a real time logging system. we index 4.5 millions
events/day, spread over multiple servers, each with its own index. every
night with delete events from the index based on a retention policy then
we optimize. each server takes between 1 and 2 hours to optimize. ideally,
we wo
Hi,
I index several millions small documents per day. each day, I remove some
of the older documents to keep the index at a stable number of documents.
after each purge, I commit then I optimize the index. what I found is that
if I keep optimizing with max num segments = 2, then the index keeps
Hi,
here is a concrete example.
I am starting with an index that has 19017236 docs, which takes 58989 Mb
on disk:
21.07.2011 15:2120 segments.gen
21.07.2011 15:21 2'974 segments_2acy4
21.07.2011 13:58 0 write.lock
16.07.2011 02:2133'445'798'8
hi,
closing after the 2 segments optimize does not change it.
also I am running with lucene 3.1.0.
cheers,
vince
Ian Lea
21.07.2011 17:30
Please respond to
java-user@lucene.apache.org
To
java-user@lucene.apache.org
cc
Subject
Re: optimize with num segments > 1 index keeps growin
Hi, thanks for this explanation.
so what is the best solution: merge the large segment (how can I do that)
or work with many segments (10?) so that I will avoid have this "large
segment" issue?
thanks,
vince
Vincent Sevel
Lombard Odier Darier Hentsch & Cie
11, rue de la Corraterie - 1204 Genèv
Hi,
this post is quite old, but I would like to share some recen developments.
I applied the recommandation. my process became: expunge deletes and
optimize 2 segments.
at the time I was with lucene 3.1 and that solved my issue. recently I
moved to lucene 3.3, and I tried playing with the new
Hi, even with setExpungeDeletesPctAllowed(0.0), I could not get docs to
get removed from disk.
after the expunge+commit I print again the numDeletedDocs, and it stays
the same.
regards,
vincent
Michael McCandless
09.09.2011 20:53
Please respond to
java-user@lucene.apache.org
T
Hi,
here is the code:
writer.commit(); // make sure nothing is buffered
mgr.printIndexState("Expunging deletes using " + writer
.getConfig().getMergePolicy());
setDirectLogger(); // redirect infoStream toward log4j
writer.expungeDeletes();
OK. that worked.
thanks,
vincent
Michael McCandless
13.09.2011 12:44
Please respond to
java-user@lucene.apache.org
To
java-user@lucene.apache.org
cc
Subject
Re: optimize with num segments > 1 index keeps growing
OK thanks for the infoStream output -- it was very helpful!
Hi,
I have an index with 35 millions docs in it. every day I need to delete
some of the oldest docs that meet some criteria.
I can easily do this on the searcher by using search(Query query, int n,
Sort sort)
but there is nothing equivalent for the deleteDocuments.
what are my options?
thank
Hi, thanks for your answer. out of the 35 millions docs, I need to delete
1 million...
and unfortunately, the ability to put a sort and a max event is not on the
query, but as args in the index searcher.
so I do not see how to do it with deleteDocuments.
regards,
vincent
Ian Lea
Hi,
this was clear actually. I was questionning the performance impact to call
IndexReader.deleteDocument(int docNum) one million time. any information
about that?
thanks,
vincent
Ian Lea
14.09.2011 16:20
Please respond to
java-user@lucene.apache.org
To
java-user@lucene.apache.o
Hi,
our application is indexing our logging events as documents. when the
index reaches a limit, I want to delete the oldest 1 million events. since
the number of events per day changes on a day to day basis, I cannot just
delete blindly the last 3 days for instance.
based on your different inp
Hi,
I have an application that has an index with 30 millions docs in it. every
day, I add around 1 million docs, and I remove the oldest 1 million, to
keepit stable at 30 million.
for the most part doc fields are indexed and stored. each doc weighs
around from a few Kb to a 1 Mb (a few Mb in so
Hi,
I did the following on the existing index:
- expunge deletes
- optimize(5)
- check index
then from the existing index I exported all docs into a new one, then on
the new one I did:
- optimize(5)
- check index
the entire log is in http://dl.dropbox.com/u/47469698/lucene/index.txt
durin
24 matches
Mail list logo