Wide rows, dynamic columns are still possible in CQL3.  There are some links 
here http://comments.gmane.org/gmane.comp.db.cassandra.user/30321

Also, there are other advantages to noSQL, not just schemaless aspect such as 
that it can accept tons of writes and you can scale the writes(you can't do 
that with an RDBMS).  With an RDBMS you can typically scale the reads with 
backups and stuff but there is limits here too.  There are not limits with 
noSQL…just double your nodes and get double the read throughput. This has 
nothing to do with how much you can store at all.  You maybe are only storing 
200G with an amazing write/read throughput including TONS of deletes to keep it 
under 200G.

That comes to the next advantage….store huge amounts of data.  If you have 1000 
machines and 300G on each machine, you are storing 300T or 1/3 Petabytes.  Have 
fun with an RDBMS.

So yes, schemaless is one advantage, throughput is another, total storage room 
is yet another.  HA is probably debatable, but in my opinion HA has been 
another advantage we have seen.  We have had a hardware outage and no downtime 
already with cassandra whereas on a previous project oracle RAC did not really 
hold up to it's promises.  There may be another advantage I may be missing as 
well.

Also, PlayOrm for java client currently uses thrift(astyanax specifically) and 
so do a ton of projects right now.  I know PlayOrm is about to upgrade to CQL3 
as well so it can do thrift or CQL3 in the future.

Later,
Dean

From: Matthew Hillsborough 
<matthew.hillsboro...@gmail.com<mailto:matthew.hillsboro...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, May 27, 2013 8:28 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Using CQL to insert a column to a row dynamically

Hi all,

I posted a similar thread on stackoverflow - hope it's not repetitive for 
anyone here. Looking for better insight from the community on whether Cassandra 
is the right tool for me or not.

I am trying to understand some fundamentals in Cassandra, I was under the 
impression that one of the advantages a developer can take in designing a data 
model is by dynamically adding columns to a row identified by a key. That means 
I can model my data so that if it makes sense, a key can be something such as a 
user_id from a relational database, and I can for example, create arbitrary 
amounts of columns that relate to that user.

What I'm not understanding is why there is so much emphasis to predefined 
columns in CQL examples, particularly in the CREATE TABLE/COLUMNFAMILY examples:

CREATE TABLE emp (

  empID int,

  deptID int,

  first_name varchar,

  last_name varchar,

  PRIMARY KEY (empID, deptID)

);

Wouldn't this type of model make more sense to just stuff into a relational 
database? What if I don't know my column name until runtime and need to 
dynamically create it? Do I have to use ALTER TABLE to add a new column to the 
row using CQL? The particular app use-case I have in mind I would just need a 
key identifier and arbitrary column names where the column name might include a 
timestamp+variable_identifier. The whole point is that so I can see have 
extremely wide rows at the wonderful performance that Cassandra has to offer. 
As of right now, from everything I'm reading in regards to DataStax 
recommending CQL over Thrift (I think what I'm describing is possible with 
Thrift, but correct me if I'm wrong). That means I'd have to go AGAINST the 
recommendation to a protocol that's pretty much going to eventually not be 
supported.

Is Cassandra the right tool for that? Are the predefined columns in 
documentation nothing more than an example? How does one add a dynamic column 
name with an existing column family/table? If I'm stuck with static columns, 
how is this any different than using a relational database such as postgres or 
mysql? What I found really powerful about Cassandra is being able to do 
something like the following in cassandra-cli which uses Thrift:


SET mycf[id]['arbitrary_column'] = 'foo';

However, doing that in CQL isn't possible. Completely limits the way I was 
going to model my data for an application and would have no distinct advantage 
over a relational database.


Please tell me I'm an idiot and/or am wrong and how I can make this work. It 
seems Thrift is the only solution, but I hate going against the recommended 
protocol.


Thanks.

Reply via email to