search for token starting with a wildcard

2012-04-12 Thread v . sevel
Hi, I have a large index with a field that contains a important number of terms. I knew that searching with a term starting with a wildcard was not a good idea; looking at WildcardTermEnum(IndexReader,Term) and IndexReader.terms(Term) I understand better why now. I have been asked however by m

recover corrupted index

2013-01-10 Thread v . sevel
Hi, I have an index, for which I am missing at least 1 file after hitting a disk full situation. is there any way I could bypass the error I get when trying to open the index, to salvage as many docs as I can from the other files? thanks, vince java.io.FileNotFoundException: D:\_2c9kgw.cfs (T

forceMergeDeletes vs forceMerge(int)

2013-01-11 Thread v . sevel
hi, I have an index where I add and delete a lot numerous documents that are a few kb in size. to keep space on disk stable, I have been doing forceMergeDeletes then forceMerge(5) regularly (after I do a big clean during the night). I am wondering if forceMergeDeletes would be sufficient for tha

mmap loads the entire index into memory during forceMergeDeletes/forceMerge(int)

2013-01-17 Thread v . sevel
Hi, On a 256 Gb RAM machine, we have half of our IT system running. Part of it, are 2 lucene applications, managing each a an approximate 100 Gb index. These applications are used to index logging events, and every night there is a purge, followed by a forceMergeDeletes to reclaim disk space (and

gracefully interrupting an optimize

2011-01-21 Thread v . sevel
Hi, Each night I optimize an index that contains 35 millions docs. Its takes about 1.5 hours. For maintenance reasons, it may happen that the machine gets rebooted. In that case, server gets a chance to gracefully shutdown, but eventually, the reboot script will kill the processes that did not

Re: gracefully interrupting an optimize

2011-01-26 Thread v . sevel
Hi Michael, I suppose that as you suggested, if I do a close(false) during an optimize I am supposed to expect the following exception: java.io.IOException: background merge hit exception: _3ud72:c33936445 _3uqhr:c126349 _3uuf8:c57041 _3v27p:c78599 _3vf2s:c111005 _3vfad:c6574 _3vrcj:c130263 _3

Re: gracefully interrupting an optimize

2011-01-26 Thread v . sevel
And do I need to do any cleanup once I catch the MergeAbortedException (such as writer commit or rollback)? Thanks, Vincent v.se...@lombardodier.com 26.01.2011 15:44 Please respond to java-user@lucene.apache.org To java-user@lucene.apache.org cc Subject Re: gracefully interrupti

Re: recurrent IO/CPU peaks

2011-02-22 Thread v . sevel
Hi, I did some tests with the BalancedSegmentMergePolicy, looking specifically at the optimize. I have an index that is 70 Gb large, and contains around 35 millions documents. I duplicated the index 4 times, and I ran 2 optimize with the default merge policy, and 2 with the balanced policy. He

Re: recurrent IO/CPU peaks

2011-03-01 Thread v . sevel
Hi, OK so I will not bother using TieredMergePolicy for now. I will do some more tests with the contrib balanced merge policy, playing with the optimize(maxNumSegments) to try decreasing the optimize time (which is an issue for us today). My index contains 35 millions documents. The size on dis

Re: recurrent IO/CPU peaks

2011-03-01 Thread v . sevel
Hi, we developped a real time logging system. we index 4.5 millions events/day, spread over multiple servers, each with its own index. every night with delete events from the index based on a retention policy then we optimize. each server takes between 1 and 2 hours to optimize. ideally, we wo

optimize with num segments > 1 index keeps growing

2011-07-20 Thread v . sevel
Hi, I index several millions small documents per day. each day, I remove some of the older documents to keep the index at a stable number of documents. after each purge, I commit then I optimize the index. what I found is that if I keep optimizing with max num segments = 2, then the index keeps

Re: optimize with num segments > 1 index keeps growing

2011-07-21 Thread v . sevel
Hi, here is a concrete example. I am starting with an index that has 19017236 docs, which takes 58989 Mb on disk: 21.07.2011 15:2120 segments.gen 21.07.2011 15:21 2'974 segments_2acy4 21.07.2011 13:58 0 write.lock 16.07.2011 02:2133'445'798'8

Re: optimize with num segments > 1 index keeps growing

2011-07-21 Thread v . sevel
hi, closing after the 2 segments optimize does not change it. also I am running with lucene 3.1.0. cheers, vince Ian Lea 21.07.2011 17:30 Please respond to java-user@lucene.apache.org To java-user@lucene.apache.org cc Subject Re: optimize with num segments > 1 index keeps growin

Re: optimize with num segments > 1 index keeps growing

2011-07-21 Thread v . sevel
Hi, thanks for this explanation. so what is the best solution: merge the large segment (how can I do that) or work with many segments (10?) so that I will avoid have this "large segment" issue? thanks, vince Vincent Sevel Lombard Odier Darier Hentsch & Cie 11, rue de la Corraterie - 1204 Genèv

RE: optimize with num segments > 1 index keeps growing

2011-09-09 Thread v . sevel
Hi, this post is quite old, but I would like to share some recen developments. I applied the recommandation. my process became: expunge deletes and optimize 2 segments. at the time I was with lucene 3.1 and that solved my issue. recently I moved to lucene 3.3, and I tried playing with the new

Re: optimize with num segments > 1 index keeps growing

2011-09-10 Thread v . sevel
Hi, even with setExpungeDeletesPctAllowed(0.0), I could not get docs to get removed from disk. after the expunge+commit I print again the numDeletedDocs, and it stays the same. regards, vincent Michael McCandless 09.09.2011 20:53 Please respond to java-user@lucene.apache.org T

Re: optimize with num segments > 1 index keeps growing

2011-09-12 Thread v . sevel
Hi, here is the code: writer.commit(); // make sure nothing is buffered mgr.printIndexState("Expunging deletes using " + writer .getConfig().getMergePolicy()); setDirectLogger(); // redirect infoStream toward log4j writer.expungeDeletes();

Re: optimize with num segments > 1 index keeps growing

2011-09-13 Thread v . sevel
OK. that worked. thanks, vincent Michael McCandless 13.09.2011 12:44 Please respond to java-user@lucene.apache.org To java-user@lucene.apache.org cc Subject Re: optimize with num segments > 1 index keeps growing OK thanks for the infoStream output -- it was very helpful!

deleting with sorting and max document

2011-09-14 Thread v . sevel
Hi, I have an index with 35 millions docs in it. every day I need to delete some of the oldest docs that meet some criteria. I can easily do this on the searcher by using search(Query query, int n, Sort sort) but there is nothing equivalent for the deleteDocuments. what are my options? thank

Re: deleting with sorting and max document

2011-09-14 Thread v . sevel
Hi, thanks for your answer. out of the 35 millions docs, I need to delete 1 million... and unfortunately, the ability to put a sort and a max event is not on the query, but as args in the index searcher. so I do not see how to do it with deleteDocuments. regards, vincent Ian Lea

Re: deleting with sorting and max document

2011-09-14 Thread v . sevel
Hi, this was clear actually. I was questionning the performance impact to call IndexReader.deleteDocument(int docNum) one million time. any information about that? thanks, vincent Ian Lea 14.09.2011 16:20 Please respond to java-user@lucene.apache.org To java-user@lucene.apache.o

Re: deleting with sorting and max document

2011-09-15 Thread v . sevel
Hi, our application is indexing our logging events as documents. when the index reaches a limit, I want to delete the oldest 1 million events. since the number of events per day changes on a day to day basis, I cannot just delete blindly the last 3 days for instance. based on your different inp

index bigger than it should be?

2011-10-27 Thread v . sevel
Hi, I have an application that has an index with 30 millions docs in it. every day, I add around 1 million docs, and I remove the oldest 1 million, to keepit stable at 30 million. for the most part doc fields are indexed and stored. each doc weighs around from a few Kb to a 1 Mb (a few Mb in so

Re: index bigger than it should be?

2011-10-30 Thread v . sevel
Hi, I did the following on the existing index: - expunge deletes - optimize(5) - check index then from the existing index I exported all docs into a new one, then on the new one I did: - optimize(5) - check index the entire log is in http://dl.dropbox.com/u/47469698/lucene/index.txt durin