Sorry for the extra post. This version has confusing parts removed and
better formatting.

It sounds like you are getting a handle on it, but maybe in a round-about way.
Here are some ways I like of conceptualizing Cassandra. Maybe they can help.

Either the grid analogy or the maps-of-maps analogy can apply, as they
both map conceptually to the way that we use a column family.

The maps-of-maps analogy:
Think of in terms of a sorted map to a sorted map, where:
*) the outer map is the set of rows whose whose (map) keys and (map)
values are (Cassandra) keys and (Cassandra) rows
*) the inner map for each row key is the set of columns whose keys and
values are column names and column data.
*) column data is essentially a molecule of (column name, column
value, storage timestamp). It can be thought of as the "value", but it
is stored as a 3-tuple.

The grid analogy: (This one is my favorite)
Think of the "column" as the intersection between a row key and a column name.
*) Rows may be undefined.
*) Rows that are defined may have columns that are undefined.
*) Cassandra doesn't have to store undefined values, except during
deletes and before housekeeping takes them away.
*) Cassandra operates behind the scenes in row-major order. That means
that while you can think of it terms of a Cartesian intersection, you
should know that rows will always be accessed first.

--

Another layer outward is the column family, which is also a map.

Another layer inward is the sub-column, which is also a map.
Don't get confused by super columns or sub columns. Super/Sub columns
are really API sugar to reduce some of the work of using your own
serialized aggregates within a normal column value. I find that the
confusion is usually not worth the trouble when starting out. On the
other hand, were you to implement your own aggregate types within a
column value, the purpose of super/sub columns would seem obvious.
It's just a little overly complex because of the supporting types in
the API. Since this was basically bolted on to the standard column
support, it falls into normal column behavior to the core Cassandra
machinery.

Neither the column family layer, nor the subcolumn layer have been
given the same attention as the basic row->column with respect to
performance and scalability.
This may change in the future. For now, consider that only row-keys
and column-names are places where Cassandra scales the best.

Jonathan

Reply via email to