Re: How to handle more than Integer.MAX_VALUE documents?

2010-11-02 Thread Simon Willnauer
On Wed, Nov 3, 2010 at 3:00 AM, Lance Norskog wrote: > You would have to control your MergePolicy so it doesn't collapse > everything back to one segment. maxmergedocs is an int too though! simon > > On Tue, Nov 2, 2010 at 12:03 PM, Simon Willnauer > wrote: >> On Tue, Nov 2, 2010 at 1:58 AM, Lan

RE: How to handle more than Integer.MAX_VALUE documents?

2010-11-02 Thread Zhang, Lisheng
Hi, Thanks very much for your helps! Your point is well taken and it may cover most use cases, but it seems to me that in principle the limit is not just for one segment: suppose within one index we have 3 segments and each has docs close to 2^31-1, then if I need to loop through most docs in a

Re: How to handle more than Integer.MAX_VALUE documents?

2010-11-02 Thread Lance Norskog
You would have to control your MergePolicy so it doesn't collapse everything back to one segment. On Tue, Nov 2, 2010 at 12:03 PM, Simon Willnauer wrote: > On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog wrote: >> 2billion is a hard limit. Usually people split indexes into multiple >> index long b

Re: How to handle more than Integer.MAX_VALUE documents?

2010-11-02 Thread Simon Willnauer
On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog wrote: > 2billion is a hard limit. Usually people split indexes into multiple > index long before this, and use the parallel multi reader (I think) to > read from all of the sub-indexes. > > On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng > wrote: >> >

Re: IndexWriter.close() performance issue

2010-11-02 Thread Mark Kristensson
Wonderful information on what happens during indexWriter.close(), thank you very much! I've got some testing to do as a result. We are on Lucene 3.0.0 right now. One other detail that I neglected to mention is that the batch size does not seem to have any relation to the time it takes to close

Re: Simple search question

2010-11-02 Thread darren
Couldn't one write a custom filter that modified the inbound term semantics before doing the search? Then, wildcard behavior can be added to terms without doing query string splicing. > You might take a look at Ngrams. These can be used to find partial > matches without resorting to wildcards, alt

Re: Simple search question

2010-11-02 Thread Erick Erickson
You might take a look at Ngrams. These can be used to find partial matches without resorting to wildcards, although they may add to your index size... Best Erick On Tue, Nov 2, 2010 at 10:39 AM, Dirk Reske wrote: > No, we don't want to user to write the * itself. > And seperate fields for the f

Re: Simple search question

2010-11-02 Thread Ian Lea
Tokenizing and then passing through the query parser sounds reasonable to me. You could build the query yourself, but that will be a bit more work. You could also combine a non-wildcard search with a wildcard search, boosting the first one. So that "John Doe" would score higher than "Johnny Donc

Re: Simple search question

2010-11-02 Thread findbestopensource
In this case also, You may need to index the fields separately. This will give better control. Have a parser, which splits the terms and applies * to the end. Search using the terms. Regards Aditya www.findbestopensource.com On Tue, Nov 2, 2010 at 8:09 PM, Dirk Reske wrote: > No, we don't want

Re: Simple search question

2010-11-02 Thread Dirk Reske
No, we don't want to user to write the * itself. And seperate fields for the first and the last name are also not acceptable. Image all the social networks, where you type a part of a name into the textbox, and get all people whose names (first or last) contains one of your searched words. The use

Re: Simple search question

2010-11-02 Thread findbestopensource
Yes. Correct. It would be good, If User inputs the search string with *. My Idea is to index two fields separately first name and last name. Provide two text boxes with first name and last name. Leave the rest to the User. Regrads Aditya www.findbestopensource.com On Tue, Nov 2, 2010 at 7:44 P

Simple search question

2010-11-02 Thread Dirk Reske
Hello, we are quite new to lucene. At first we want to create a simple user search for our web application. My first thought was to map die 'display name' (= firstname + lastname) to a single field (analysed but not stored) and to put the database id of the user to a stored, not analysed field

Re: filtering results per field?

2010-11-02 Thread findbestopensource
Hello Doing single search with multiple filters will give faster results. Doing search per field (multiple saerch) and combining the results is a bad idea. Regards Aditya www.findbestopensource.com On Mon, Nov 1, 2010 at 11:02 PM, Francisco Borges < francisco.bor...@gmail.com> wrote: > Hello,

Re: IndexWriter.close() performance issue

2010-11-02 Thread Shai Erera
When you close IndexWriter, it performs several operations that might have a connection to the problem you describe: * Commit all the pending updates -- if your update batch size is more or less the same (i.e., comparable # of docs and total # bytes indexed), then you should not see a performance