I am new to Lucene and Lucandra.

My use case is that I have a trillion URIs to index with Lucene. Each URI is 
either a resource or literal in an RDF graph. Each URI is a document for Lucene

If I were using Lucene, my understanding is that it would create a segment, 
stuff as many URIs in the segment until it hit either the document limit, 
around 2 billion, of the maximum size of the index. Lets say for the sake of 
argument that I only store 1billion URIs in a segment, then I would have 1000 
segments to index my URIs.

Lucandra does not support segments. How would I index a trillion URIs? Based on 
the below comments, I could only have around 2 billion URIs, or documents, per 
index. Would I have to create separate indexes to store all the URIs? Using the 
case where I store only 1 billion URIs in an index, would I have to create 1000 
indexes? Since these are indexes and not segments, which would have been 
handled by Lucene, do I have to do my search against each index? Lucene 
supports the ability to create multiple IndexSearchers and stick them in a 
MultiSearcher.

Is this the right way to view the problem?
-------------
Sincerely,
David G. Boney
dbon...@semanticartifacts.com
http://www.semanticartifacts.com




On Jan 27, 2011, at 12:45 PM, Jake Luciani wrote:

> Yes, but that's also the lucene limit 
> http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations
> 
> "Lucene uses a Java int to refer to document numbers, and the index file 
> format uses an Int32"
> 
> 
> 
> On Thu, Jan 27, 2011 at 1:40 PM, David G. Boney 
> <dbon...@semanticartifacts.com> wrote:
> I was reviewing the Lucandra schema presented on the below page at Datastax:
> 
> http://www.datastax.com/docs/0.7/data_model/lucandra
> 
> In the TermInfo Super Column Family, docID is the key for a supercolumn. Does 
> this imply that the maximum number of documents that can be index for a term 
> with Lucandra is two billion, the maximum number of columns?
> 
> -------------
> Sincerely,
> David G. Boney
> dbon...@semanticartifacts.com
> http://www.semanticartifacts.com
> 
> 
> 
> 
> 

Reply via email to