Re: How to handle more than Integer.MAX_VALUE documents?

Simon Willnauer Tue, 02 Nov 2010 12:03:46 -0700

On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog <goks...@gmail.com> wrote:
> 2billion is a hard limit. Usually people split indexes into multiple
> index long before this, and use the parallel multi reader (I think) to
> read from all of the sub-indexes.
>
> On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng
> <lisheng.zh...@broadvision.com> wrote:
>>
>> Hi,
>>
>> Now lucene uses integer as document id, so it means we cannot have more
>> than 2^31-1 documents within one collection? Even if we use MultiSearcher
>> the document id is still integer so it seems this is still a problem?


This is really the limit of a segment, I think you can write you own
collector and collect documents which higher (absolute) doc ids than
INT_MAX. Yet, I think if you reach the limit of INT_MAX documents you
should really rethink the way your search works and apply some
sharding techniques. I really haven't been up to that many docs in a
single index but I think it should work to have multiple segments with
INT_MAX documents in it since we search on segment level provided if
you collector supports it.

simon
>>
>> We have been using lucene for some time and our document count is growing
>> rather rapidly, maybe this is a much-discussed issue already, but I did not
>> find the lead, any pointer would be really appreciated.
>>
>> Thanks very much for helps, Lisheng
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to handle more than Integer.MAX_VALUE documents?

Reply via email to