The column index in a row is a sorted-blocked index (like b-tree), just like bigtable.
On Mon, Apr 26, 2010 at 2:43 AM, Stu Hood <stu.h...@rackspace.com> wrote: > The indexes within rows are _not_ implemented with Lucene: there is a > custom index structure that allows for random access within a row. But, you > should probably read http://wiki.apache.org/cassandra/CassandraLimitationsto > understand the current limitations of the file format, some of which are > scheduled to be fixed soon. > > -----Original Message----- > From: "TuX RaceR" <tuxrace...@gmail.com> > Sent: Sunday, April 25, 2010 11:54am > To: user@cassandra.apache.org > Subject: newbie question on how columns names are indexed/lucene > limitations? > > Hello Cassandra Users, > > When use the RandomPartinionner and a simple ColumnFamily/Columns (i.e. > no SuperColumns) my understanding is that one signle Row can store > millions of columns. > > If I look at the http://wiki.apache.org/cassandra/API, I understand that > I can get a subset of the millions of columns defined above using: > SlicePredicate->ColumnNames or SlicePredicate->SliceRange > > My question is about the implementation of this columns 'selection'. > I vaguely remember reading somewhere (but I cannot find the link again) > that this was implemented using a Lucene index over the column names for > each row. > Is that true? Is there a small lucene index per row? > > Also we know from that lucene have some limitations > http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations : you > cannot index more than 2.1 billions documents as a document ID is mapped > to a 32 bits int. > > As I plan to store in column names the ID of my cassandra documents (the > global number of documents can go well beyond 2.1 billions), will I be > hit by the lucene limitations? I.e can I store cassandra documents ID > (i.e keys) in column names, if in each individual row there are no more > than few millions of those IDs? I guess the answer is "yes I can", > because lucandra uses a similar schema but it is not clear for me why. > Is that because the lucene index is made on each row and what really > matters in the number of columns in one single row and not the number of > distinct column names (globally over all the rows)? > > > Thanks in advance > TuX > > >