I am new to Lucene and Lucandra. My use case is that I have a trillion URIs to index with Lucene. Each URI is either a resource or literal in an RDF graph. Each URI is a document for Lucene
If I were using Lucene, my understanding is that it would create a segment, stuff as many URIs in the segment until it hit either the document limit, around 2 billion, of the maximum size of the index. Lets say for the sake of argument that I only store 1billion URIs in a segment, then I would have 1000 segments to index my URIs. Lucandra does not support segments. How would I index a trillion URIs? Based on the below comments, I could only have around 2 billion URIs, or documents, per index. Would I have to create separate indexes to store all the URIs? Using the case where I store only 1 billion URIs in an index, would I have to create 1000 indexes? Since these are indexes and not segments, which would have been handled by Lucene, do I have to do my search against each index? Lucene supports the ability to create multiple IndexSearchers and stick them in a MultiSearcher. Is this the right way to view the problem? ------------- Sincerely, David G. Boney dbon...@semanticartifacts.com http://www.semanticartifacts.com On Jan 27, 2011, at 12:45 PM, Jake Luciani wrote: > Yes, but that's also the lucene limit > http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations > > "Lucene uses a Java int to refer to document numbers, and the index file > format uses an Int32" > > > > On Thu, Jan 27, 2011 at 1:40 PM, David G. Boney > <dbon...@semanticartifacts.com> wrote: > I was reviewing the Lucandra schema presented on the below page at Datastax: > > http://www.datastax.com/docs/0.7/data_model/lucandra > > In the TermInfo Super Column Family, docID is the key for a supercolumn. Does > this imply that the maximum number of documents that can be index for a term > with Lucandra is two billion, the maximum number of columns? > > ------------- > Sincerely, > David G. Boney > dbon...@semanticartifacts.com > http://www.semanticartifacts.com > > > > >