On Wed, Nov 3, 2010 at 3:00 AM, Lance Norskog wrote:
> You would have to control your MergePolicy so it doesn't collapse
> everything back to one segment.
maxmergedocs is an int too though!
simon
>
> On Tue, Nov 2, 2010 at 12:03 PM, Simon Willnauer
> wrote:
>> On Tue, Nov 2, 2010 at 1:58 AM, Lan
Hi,
Thanks very much for your helps!
Your point is well taken and it may cover most use cases, but it seems
to me that in principle the limit is not just for one segment: suppose
within one index we have 3 segments and each has docs close to 2^31-1,
then if I need to loop through most docs in a
You would have to control your MergePolicy so it doesn't collapse
everything back to one segment.
On Tue, Nov 2, 2010 at 12:03 PM, Simon Willnauer
wrote:
> On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog wrote:
>> 2billion is a hard limit. Usually people split indexes into multiple
>> index long b
On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog wrote:
> 2billion is a hard limit. Usually people split indexes into multiple
> index long before this, and use the parallel multi reader (I think) to
> read from all of the sub-indexes.
>
> On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng
> wrote:
>>
>
Wonderful information on what happens during indexWriter.close(), thank you
very much! I've got some testing to do as a result.
We are on Lucene 3.0.0 right now.
One other detail that I neglected to mention is that the batch size does not
seem to have any relation to the time it takes to close
Couldn't one write a custom filter that modified the inbound term
semantics before doing the search? Then, wildcard behavior can be added to
terms without doing query string splicing.
> You might take a look at Ngrams. These can be used to find partial
> matches without resorting to wildcards, alt
You might take a look at Ngrams. These can be used to find partial
matches without resorting to wildcards, although they may add to
your index size...
Best
Erick
On Tue, Nov 2, 2010 at 10:39 AM, Dirk Reske wrote:
> No, we don't want to user to write the * itself.
> And seperate fields for the f
Tokenizing and then passing through the query parser sounds reasonable
to me. You could build the query yourself, but that will be a bit
more work. You could also combine a non-wildcard search with a
wildcard search, boosting the first one. So that "John Doe" would
score higher than "Johnny Donc
In this case also, You may need to index the fields separately. This will
give better control. Have a parser, which splits the terms and applies * to
the end. Search using the terms.
Regards
Aditya
www.findbestopensource.com
On Tue, Nov 2, 2010 at 8:09 PM, Dirk Reske wrote:
> No, we don't want
No, we don't want to user to write the * itself.
And seperate fields for the first and the last name are also not
acceptable.
Image all the social networks, where you type a part of a name into the
textbox, and get all people whose names (first or last) contains one of
your searched words. The use
Yes. Correct. It would be good, If User inputs the search string with *.
My Idea is to index two fields separately first name and last name. Provide
two text boxes with first name and last name. Leave the rest to the User.
Regrads
Aditya
www.findbestopensource.com
On Tue, Nov 2, 2010 at 7:44 P
Hello,
we are quite new to lucene.
At first we want to create a simple user search for our web
application.
My first thought was to map die 'display name' (= firstname +
lastname) to a single field (analysed but not stored)
and to put the database id of the user to a stored, not analysed field
Hello
Doing single search with multiple filters will give faster results.
Doing search per field (multiple saerch) and combining the results is a bad
idea.
Regards
Aditya
www.findbestopensource.com
On Mon, Nov 1, 2010 at 11:02 PM, Francisco Borges <
francisco.bor...@gmail.com> wrote:
> Hello,
When you close IndexWriter, it performs several operations that might have a
connection to the problem you describe:
* Commit all the pending updates -- if your update batch size is more or
less the same (i.e., comparable # of docs and total # bytes indexed), then
you should not see a performance
14 matches
Mail list logo