I wanted to clarify the where that statement comes from on wide rows ….

Realize some people make the claim that if you don’t' have 1000's of columns in 
"some" rows in cassandra you are doing something wrong.  This is not true, BUT 
it comes from the fact that people are setting up indexes.  This is what leads 
to the very wide row affect.  playOrm is one such library using wide rows like 
this BUT it is NOT necessary for all applications.

You can easily use map/reduce on a cassandra cluster.  You can map/reduce your 
dataset into a new model if you make a mistake as well and don't get it right 
the first time.  This wide row affect is 80% of the time used for indexing.  I 
draw off playOrm examples a lot but one table may be partitioned by time so 
each month of data is in a partition, you can then have indexes on each 
partition allowing you to do quick queries into partitions.

Later,
Dean

From: Marcelo Elias Del Valle <mvall...@gmail.com<mailto:mvall...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, September 17, 2012 4:28 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Is Cassandra right for me?

Hello,

     I am new to Cassandra and I am in doubt if Cassandra is the right 
technology to use in the architecture I am defining. Also, I saw a presentation 
which said that if I don't have rows with more than a hundred rows in 
Cassandra, whether I am doing something wrong or I shouldn't be using 
Cassandra. Therefore, it might be the case I am doing something wrong. If you 
could help me to find out the answer for these questions by giving any 
feedback, it would be highly appreciated.
     Here is my need and what I am thinking in using Cassandra for:

 *   I need to support a high volume of writes per second. I might have a 
billion writes per hour
 *   I need to write non-structured data that will be processed later by hadoop 
processes to generate structured data from it. Later, I index the structured 
data using SOLR or SOLANDRA, so the data can be consulted by my end user 
application. Is Cassandra recommended for that, or should I be thinking in 
writting directly to HDFS files, for instance? What's the main advantage I get 
from storing data in a nosql service like Cassandra, when compared to storing 
files into HDFS?
 *   Usually I will write json data associated to an ID and my hadoop processes 
will process this data to write data to a database. I have two doubts here:
    *   If I don't need to perform complicated queries in Cassandra, should I 
store the json-like data just as a column value? I am afraid of doing something 
wrong here, as I would need just to store the json file and some more 5 or 6 
fields to query the files later.
    *   Does it make sense to you to use hadoop to process data from Cassandra 
and store the results in a database, like HBase? Once I have structured data, 
is there any reason I should use Cassandra instead of HBase?

     I am sorry if the questions are too dummy, I have been watching a lot of 
videos and reading a lot of documentation about Cassandra, but honestly, more I 
read more I have questions.

Thanks in advance.

Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

Reply via email to