On Tue, Nov 2, 2010 at 1:58 AM, Lance Norskog <goks...@gmail.com> wrote: > 2billion is a hard limit. Usually people split indexes into multiple > index long before this, and use the parallel multi reader (I think) to > read from all of the sub-indexes. > > On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng > <lisheng.zh...@broadvision.com> wrote: >> >> Hi, >> >> Now lucene uses integer as document id, so it means we cannot have more >> than 2^31-1 documents within one collection? Even if we use MultiSearcher >> the document id is still integer so it seems this is still a problem?
This is really the limit of a segment, I think you can write you own collector and collect documents which higher (absolute) doc ids than INT_MAX. Yet, I think if you reach the limit of INT_MAX documents you should really rethink the way your search works and apply some sharding techniques. I really haven't been up to that many docs in a single index but I think it should work to have multiple segments with INT_MAX documents in it since we search on segment level provided if you collector supports it. simon >> >> We have been using lucene for some time and our document count is growing >> rather rapidly, maybe this is a much-discussed issue already, but I did not >> find the lead, any pointer would be really appreciated. >> >> Thanks very much for helps, Lisheng >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > > -- > Lance Norskog > goks...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org