The problem about "being careful about how much you store in a collection" is that Cassandra is a blind-write system. Knowing how much data is currently in the collection before you write is an anti-pattern, read before write.
Cassandra Rule 1: DON'T READ BEFORE WRITE Cassandra Rule 2: ROWS CAN HAVE 2 BILLION COLUMNS Collection Rule 1: DON'T STORE MORE THEN 100 THINGS IN A COLLECTION Why does are user confused? Its simple. On Thu, Jun 6, 2013 at 10:51 AM, Eric Stevens <migh...@gmail.com> wrote: > CQL3 does now support dynamic columns. For tags or metadata values you >> could use a Collection: >> > > This should probably be clarified. A collection is a super useful tool, > but it is *not* the same thing as a dynamic column. It has many > advantages, but there is one huge disadvantage in that you have to be > careful how much data you store in a collection. When you read a single > value out of a collection, the *entire* collection is always read, which > of course is true for appending data to the collection as well. > > With a traditional dynamic column, you could have added things like event > logs to a record in the form of keys named "event:someEvent:TS" (or > juxtapose the order as your needs dictate). You could basically do this > practically indefinitely with little degradation in performance. This was > also a common way of representing cross-family relationships (one-to-many > style). > > If you try to do the same thing with a collection, performance will > degrade as your data grows. For small or relatively static data sets (eg > tags) that's fine. For open-ended data sets (logs, events, one-to-many > relationships that grow regularly), you should instead normalize such data > into a separate column family. > > -Eric Stevens > ProtectWise, Inc. > > > On Thu, Jun 6, 2013 at 9:49 AM, Francisco Andrades Grassi < > bigjoc...@gmail.com> wrote: > >> Hi, >> >> CQL3 does now support dynamic columns. For tags or metadata values you >> could use a Collection: >> >> http://www.datastax.com/dev/blog/cql3_collections >> >> For wide rows there's the enhanced primary keys, which I personally >> prefer over the composite columns of yore: >> >> http://www.datastax.com/dev/blog/cql3-for-cassandra-experts >> http://thelastpickle.com/2013/01/11/primary-keys-in-cql/ >> >> -- >> Francisco Andrades Grassi >> www.bigjocker.com >> @bigjocker >> >> On Jun 6, 2013, at 8:32 AM, Joe Greenawalt <joe.greenaw...@gmail.com> >> wrote: >> >> Hi, >> I'm having some problems figuring out how to append a dynamic column on a >> column family using the datastax java driver 1.0 and CQL3 on Cassandra >> 1.2.5. Below is what i'm trying: >> >> *cqlsh:simplex> create table user (firstname text primary key, lastname >> text); >> cqlsh:simplex> insert into user (firstname, lastname) values >> ('joe','shmoe'); >> cqlsh:simplex> select * from user; >> >> firstname | lastname >> -----------+---------- >> joe | shmoe >> >> cqlsh:simplex> insert into user (firstname, lastname, middlename) values >> ('joe','shmoe','lester'); >> Bad Request: Unknown identifier middlename >> cqlsh:simplex> insert into user (firstname, lastname, middlename) values >> ('john','shmoe','lester'); >> Bad Request: Unknown identifier middlename* >> >> I'm assuming you can do this based on previous based thrift based clients >> like pycassa, and also by reading this: >> >> The Cassandra data model is a dynamic schema, column-oriented data model. >> This means that, unlike a relational database, you do not need to model all >> of the columns required by your application up front, as each row is not >> required to have the same set of columns. Columns and their metadata can be >> added by your application as they are needed without incurring downtime to >> your application. >> here: http://www.datastax.com/docs/1.2/ddl/index >> >> Is it a limitation of CQL3 and its connection vs. thrift? >> Or more likely i'm just doing something wrong? >> >> Thanks, >> Joe >> >> >> >