Your data model should take into consideration the number of items you're storing in a collection. If you expect it will grow over time with no small upper bound, don't use a collection. You don't need to read before write to answer this question, it's a decision made at modeling time (before you ever write your very first record).
If the possible values are finite and small, use a collection. Otherwise normalize. Over time if you find your collections are getting large, then either an assumption changed or you modeled poorly. Either way it's time to refactor. DON'T STORE MORE THEN 100 THINGS IN A COLLECTION > Actually that's probably a bit too hard edged. You could easily have a Set<Int> whose typical size is 1000. If the data doesn't change often, and you always need to know all those values at the same time as each other, there's actually no problem with this. Constantly mutating values are a problem as the collection gets large, or cases where you need to know only a subset of the the collection at a time. -Eric Stevens ProtectWise, Inc. On Thu, Jun 6, 2013 at 10:59 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > The problem about "being careful about how much you store in a collection" > is that Cassandra is a blind-write system. Knowing how much data is > currently in the collection before you write is an anti-pattern, read > before write. > > Cassandra Rule 1: DON'T READ BEFORE WRITE > Cassandra Rule 2: ROWS CAN HAVE 2 BILLION COLUMNS > Collection Rule 1: DON'T STORE MORE THEN 100 THINGS IN A COLLECTION > > Why does are user confused? Its simple. > > > > > > > > > > On Thu, Jun 6, 2013 at 10:51 AM, Eric Stevens <migh...@gmail.com> wrote: > >> CQL3 does now support dynamic columns. For tags or metadata values you >>> could use a Collection: >>> >> >> This should probably be clarified. A collection is a super useful tool, >> but it is *not* the same thing as a dynamic column. It has many >> advantages, but there is one huge disadvantage in that you have to be >> careful how much data you store in a collection. When you read a single >> value out of a collection, the *entire* collection is always read, which >> of course is true for appending data to the collection as well. >> >> With a traditional dynamic column, you could have added things like event >> logs to a record in the form of keys named "event:someEvent:TS" (or >> juxtapose the order as your needs dictate). You could basically do this >> practically indefinitely with little degradation in performance. This was >> also a common way of representing cross-family relationships (one-to-many >> style). >> >> If you try to do the same thing with a collection, performance will >> degrade as your data grows. For small or relatively static data sets (eg >> tags) that's fine. For open-ended data sets (logs, events, one-to-many >> relationships that grow regularly), you should instead normalize such data >> into a separate column family. >> >> -Eric Stevens >> ProtectWise, Inc. >> >> >> On Thu, Jun 6, 2013 at 9:49 AM, Francisco Andrades Grassi < >> bigjoc...@gmail.com> wrote: >> >>> Hi, >>> >>> CQL3 does now support dynamic columns. For tags or metadata values you >>> could use a Collection: >>> >>> http://www.datastax.com/dev/blog/cql3_collections >>> >>> For wide rows there's the enhanced primary keys, which I personally >>> prefer over the composite columns of yore: >>> >>> http://www.datastax.com/dev/blog/cql3-for-cassandra-experts >>> http://thelastpickle.com/2013/01/11/primary-keys-in-cql/ >>> >>> -- >>> Francisco Andrades Grassi >>> www.bigjocker.com >>> @bigjocker >>> >>> On Jun 6, 2013, at 8:32 AM, Joe Greenawalt <joe.greenaw...@gmail.com> >>> wrote: >>> >>> Hi, >>> I'm having some problems figuring out how to append a dynamic column on >>> a column family using the datastax java driver 1.0 and CQL3 on Cassandra >>> 1.2.5. Below is what i'm trying: >>> >>> *cqlsh:simplex> create table user (firstname text primary key, lastname >>> text); >>> cqlsh:simplex> insert into user (firstname, lastname) values >>> ('joe','shmoe'); >>> cqlsh:simplex> select * from user; >>> >>> firstname | lastname >>> -----------+---------- >>> joe | shmoe >>> >>> cqlsh:simplex> insert into user (firstname, lastname, middlename) values >>> ('joe','shmoe','lester'); >>> Bad Request: Unknown identifier middlename >>> cqlsh:simplex> insert into user (firstname, lastname, middlename) values >>> ('john','shmoe','lester'); >>> Bad Request: Unknown identifier middlename* >>> >>> I'm assuming you can do this based on previous based thrift based >>> clients like pycassa, and also by reading this: >>> >>> The Cassandra data model is a dynamic schema, column-oriented data >>> model. This means that, unlike a relational database, you do not need to >>> model all of the columns required by your application up front, as each row >>> is not required to have the same set of columns. Columns and their metadata >>> can be added by your application as they are needed without incurring >>> downtime to your application. >>> here: http://www.datastax.com/docs/1.2/ddl/index >>> >>> Is it a limitation of CQL3 and its connection vs. thrift? >>> Or more likely i'm just doing something wrong? >>> >>> Thanks, >>> Joe >>> >>> >>> >> >