Re: Searching while optimizing

2009-11-27 Thread Michael McCandless
Phew, thanks for testing! It's all explainable... When you have a reader open, it prevents the segments it had opened from being deleted. When you close that reader, the segments could be deleted, however, that won't happen until the writer next tries to delete, which it does only periodically (

Re: Searching while optimizing

2009-11-27 Thread vsevel
Hi, I have done some testing that I would like to share with you. I am starting my tests with an unoptimized 40Mb index. I have 3 test cases: 1) open a writer, optimize, commit, close 2) open a writer, open a reader from the writer, optimize, commit, close 3) same as 2) except the reader is opene

Re: Books about lucene

2009-11-27 Thread Martijn v Groningen
http://nlp.stanford.edu/IR-book/information-retrieval-book.html gives a good introduction what happens under the hood of a search engine and you can download it for free. It does not explain Lucene directly, but a lot of IR algorithms that are used in Lucene (and any other search engine) are explai

Re: What does "out of order" mean?

2009-11-27 Thread Stefan Trcek
On Friday 27 November 2009 14:49:07 Michael McCandless wrote: > > So the "don't care" equivalent here is to use IndexSearcher's normal > search APIs (ie, we don't use Version to switch this on or off). Thanks for the hint. For an unknown reason I once fell into the "search(query, filter, collecto

Re: MergePolicy$MergeException CorruptIndexException in lucene2.4.1

2009-11-27 Thread jm
I'll check all thsese with my ops guy on monday and report back. Thanks for the interest. On Fri, Nov 27, 2009 at 4:00 PM, Michael McCandless wrote: > Any Lucene-related exceptions hit in your env?  What OS (looks like > Windows, but which one?), filesystem are you on? > > And are you really cert

Re: MergePolicy$MergeException CorruptIndexException in lucene2.4.1

2009-11-27 Thread Michael McCandless
Any Lucene-related exceptions hit in your env? What OS (looks like Windows, but which one?), filesystem are you on? And are you really certain about the java version being used in your production env? Don't just trust which java your interactive shell finds on its PATH -- double check how your a

Re: MergePolicy$MergeException CorruptIndexException in lucene2.4.1

2009-11-27 Thread Michael McCandless
On Fri, Nov 27, 2009 at 6:23 AM, jm wrote: > Ok, I got the index from the production machine, but I am having some > problem to find the index..., our process deals with multiple indexes, > in the current exception I cannot see any indication about the index > having the issue.  I opened all my in

Re: What does "out of order" mean?

2009-11-27 Thread Michael McCandless
On Fri, Nov 27, 2009 at 8:13 AM, Stefan Trcek wrote: > On Friday 27 November 2009 12:07:07 Michael McCandless wrote: >> Re: What does "out of order" mean? >> >> It refers to the order in which the docIDs are delivered to your >> Collector. >> >> "Normally" they are always delivered in increasing o

Re: What does "out of order" mean?

2009-11-27 Thread Stefan Trcek
On Friday 27 November 2009 12:07:07 Michael McCandless wrote: > Re: What does "out of order" mean? > > It refers to the order in which the docIDs are delivered to your > Collector. > > "Normally" they are always delivered in increasing order. > > However, some queries (well, currently only certain

Re: MergePolicy$MergeException CorruptIndexException in lucene2.4.1

2009-11-27 Thread jm
I manually did CheckIndex in all indexes and found two with issues: first Segments file=segments_42w numSegments=21 version=FORMAT_HAS_PROX [Lucene 2.4] 1 of 21: name=_109 docCount=10410 compound=true hasProx=true numFiles=1 size (MB)=55,789 no deletions test: open reader

Re: transition 2.4 -> 3.0 (please help me to help myself)

2009-11-27 Thread Simon Willnauer
Additionally there is a whitepaper on http://www.lucidimagination.com/How-We-Can-Help/whitepaper What is new in Lucene 2.9 which gives you an overview over the new features - this is not on a API level though. simon On Fri, Nov 27, 2009 at 12:42 PM, Helmut Jarausch wrote: > Hi, > > could anybod

RE: transition 2.4 -> 3.0 (please help me to help myself)

2009-11-27 Thread Uwe Schindler
That's the way to go: public TopDocs search(Query query, int n) throws IOException Finds the top n hits for query. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: He

Re: transition 2.4 -> 3.0 (please help me to help myself)

2009-11-27 Thread Ian Lea
There is indeed no search(Query) method in 3.0. Your best bet is to compile your application against 2.9 and fix any deprecation warnings - see the javadocs for alternatives. If it compiles cleanly against 2.9 it should also compile against 3.0. -- Ian. On Fri, Nov 27, 2009 at 11:42 AM, Helm

transition 2.4 -> 3.0 (please help me to help myself)

2009-11-27 Thread Helmut Jarausch
Hi, could anybody please point me to some documention with (more detailed) information about the API change. E.g. (in PyLucene) Q=lucene.TermQuery(lucene.Term('@URI',BookNr)) FSDir= lucene.SimpleFSDirectory(lucene.File('/home/jarausch/Bib_Dev/DIR/')) index_reader= lucene.IndexReader.open(FSDir)

Re: MergePolicy$MergeException CorruptIndexException in lucene2.4.1

2009-11-27 Thread jm
Ok, I got the index from the production machine, but I am having some problem to find the index..., our process deals with multiple indexes, in the current exception I cannot see any indication about the index having the issue. I opened all my indexes with luke and old opened succesfully, some had

RE: To exit the while loop if match is found

2009-11-27 Thread DHIVYA M
Anyways thanks for your suggestion sir --- On Fri, 27/11/09, Uwe Schindler wrote: From: Uwe Schindler Subject: RE: To exit the while loop if match is found To: java-user@lucene.apache.org Date: Friday, 27 November, 2009, 11:19 AM This question is out of the scope of Lucene. Try using some AJ

RE: To exit the while loop if match is found

2009-11-27 Thread Uwe Schindler
This question is out of the scope of Lucene. Try using some AJAX frameworks like YUI for the communication between your textbox and the server. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: DHIVYA M [ma

RE: To exit the while loop if match is found

2009-11-27 Thread DHIVYA M
Sir anyways i want this to happen in keypress event of the text box. Can u suggest me a way for this?   Thanks in advance, Dhivya.M --- On Fri, 27/11/09, Uwe Schindler wrote: From: Uwe Schindler Subject: RE: To exit the while loop if match is found To: java-user@lucene.apache.org Date: Friday,

Re: best practice on too many files vs IO overhead

2009-11-27 Thread Michael McCandless
Phew :) Thanks for bringing closure! Mike On Fri, Nov 27, 2009 at 6:02 AM, Michael McCandless wrote: > If in fact you are using CFS (it is the default), and your OS is > letting you use 10240 descriptors, and you haven't changed the > mergeFactor, then something is seriously wrong.  I would tri

Re: best practice on too many files vs IO overhead

2009-11-27 Thread Istvan Soos
You were right, my bad... I have an async reader closing on a scheduled basis (after the writer refreshes the index, to not interrupt the ongoing searches), but while I've setup the scheduling for my first two index, I've forgotten it in my third... oh dear... Thanks anyway the info, it was usefu

Re: What does "out of order" mean?

2009-11-27 Thread Michael McCandless
It refers to the order in which the docIDs are delivered to your Collector. "Normally" they are always delivered in increasing order. However, some queries (well, currently only certain BooleanQuery cases) can achieve substantial search speedup if they are allowed to deliver docIDs to your collec

Re: best practice on too many files vs IO overhead

2009-11-27 Thread Michael McCandless
If in fact you are using CFS (it is the default), and your OS is letting you use 10240 descriptors, and you haven't changed the mergeFactor, then something is seriously wrong. I would triple check that all readers are being closed. Or... if you list the index directory, how many files do you see?

RE: To exit the while loop if match is found

2009-11-27 Thread Uwe Schindler
Another more simplier approach is to use http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/Prefix TermEnum.html It is a wrapper term enumeration that lists all Terms with the supplied prefix. You do not need to filter anything manual, just use a while-loop: IndexReader reader

What does "out of order" mean?

2009-11-27 Thread Alexander Veit
Hi, The documentation of org.apache.lucene.search.Collector uses the obscure term "out of order". What does "order" mean? The natural order of document IDs, a scoring order, or some other order? -- Cheers, Alex - To unsubscr

Re: best practice on too many files vs IO overhead

2009-11-27 Thread Istvan Soos
On Fri, Nov 27, 2009 at 11:37 AM, Michael McCandless wrote: > Are you sure you're closing all readers that you're opening? Absolutely. :) (okay, never say this, but I had bugz because of this previously so I'm pretty sure that one is ok). > It's surprising with normal usage of Lucene that you'd

Re: MergePolicy$MergeException CorruptIndexException in lucene2.4.1

2009-11-27 Thread Michael McCandless
Also, if you're able to reproduce this, can you call writer.setInfoStream and capture & post the resulting output leading up to the exception? Mike On Thu, Nov 26, 2009 at 7:12 AM, jm wrote: > The process is still running and ops dont want to stop it. As soon as > stops I'll try checkindex. > >

Re: best practice on too many files vs IO overhead

2009-11-27 Thread Michael McCandless
Are you sure you're closing all readers that you're opening? It's surprising with normal usage of Lucene that you'd run out of descriptors, with its default mergeFactor (have you increased the mergeFactor)? You can also enable compound file, which uses far fewer file descriptors, at some cost to

Re: IndexDivisor

2009-11-27 Thread Danil ŢORIN
Try to open with very large value (MAX_INT) it will load only first term, and look up the rest from disk. On Fri, Nov 27, 2009 at 12:24, Michael McCandless wrote: > If you are absolutely certain you won't be doing any lookups by term. > > The only use case I know of is internal, when Lucene's Seg

Re: IndexDivisor

2009-11-27 Thread Michael McCandless
If you are absolutely certain you won't be doing any lookups by term. The only use case I know of is internal, when Lucene's SegmentMerger is merging the segment with other segments. In this case, the merger does a linear iteration of all terms, and never a lookup by term, so we save CPU/RAM by n

Re: Is it a lucene bug?

2009-11-27 Thread Savvas-Andreas Moysidis
have you considered a custom sort strategy using a ScoreDocComparator ? Inside your implementation you have access to individual doc scores and you could create a parallel (to your docs) array of floats which stores your r1,r2,r3 etc values. Then use this array to implement your int compare(ScoreDo

Re: IndexDivisor

2009-11-27 Thread Ganesh
Thanks, May i know the purpose of using negative value? Regards Ganesh - Original Message - From: "Michael McCandless" To: Sent: Friday, November 27, 2009 3:17 PM Subject: Re: IndexDivisor > This is the expected behavior. > > If you intend to use the reader for searching, looking

Re: IndexDivisor

2009-11-27 Thread Michael McCandless
This is the expected behavior. If you intend to use the reader for searching, looking doc freq, deleting docs, etc, you must pass a non-negative value for indexDivisor. Mike On Fri, Nov 27, 2009 at 12:00 AM, Ganesh wrote: > Hello all, > > I am using Lucene v2.9.1, If I open my reader with posit

best practice on too many files vs IO overhead

2009-11-27 Thread Istvan Soos
Hi, I've a requirement that involves frequent, batched update of my Lucene index. This is done by a memory queue and process that periodically wakes and process that queue into the Lucene index. If I do not optimize my index, I'll receive "too many open files" exception (yeah, right, I can get th