newbie question on how columns names are indexed/lucene limitations?

TuX RaceR Sun, 25 Apr 2010 09:55:27 -0700

Hello Cassandra Users,

When use the RandomPartinionner and a simple ColumnFamily/Columns (i.e.no SuperColumns) my understanding is that one signle Row can storemillions of columns.

If I look at the http://wiki.apache.org/cassandra/API, I understand thatI can get a subset of the millions of columns defined above using:

SlicePredicate->ColumnNames or SlicePredicate->SliceRange

My question is about the implementation of this columns 'selection'.

I vaguely remember reading somewhere (but I cannot find the link again)that this was implemented using a Lucene index over the column names foreach row.

Is that true? Is there a small lucene index per row?

Also we know from that lucene have some limitationshttp://lucene.apache.org/java/3_0_1/fileformats.html#Limitations : youcannot index more than 2.1 billions documents as a document ID is mappedto a 32 bits int.

As I plan to store in column names the ID of my cassandra documents (theglobal number of documents can go well beyond 2.1 billions), will I behit by the lucene limitations? I.e can I store cassandra documents ID(i.e keys) in column names, if in each individual row there are no morethan few millions of those IDs? I guess the answer is "yes I can",because lucandra uses a similar schema but it is not clear for me why.Is that because the lucene index is made on each row and what reallymatters in the number of columns in one single row and not the number ofdistinct column names (globally over all the rows)?



Thanks in advance
TuX

newbie question on how columns names are indexed/lucene limitations?

Reply via email to