Re: Is Cassandra right for me?

Hiller, Dean Tue, 18 Sep 2012 13:32:00 -0700

Oh, and yes, that is the correct link.

Dean

From: Marcelo Elias Del Valle <mvall...@gmail.com<mailto:mvall...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, September 18, 2012 10:50 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Is Cassandra right for me?

You're talking about this project, right?  https://github.com/deanhiller/playorm
I will take a look. However, I don't think using Cassandra's model itself (with 
CFs / key-values) would be a problem, I just need to know where the advantage 
relies on. By your answer, my guess is it relies on better performance and more 
control.

I also saw that if I plan to use Data Stax enterprise to get real time 
analytics, my data would need to be stored in Cassandra's usual format. It 
would harder for me use PlayOrm if I am planning to use advanced data stax 
features, like Solr indexing data on Cassandra without copying columns, 
realtime, wouldn't it? I don't know much of this Solr feature yet, but my 
understanding today is it wouldn't be aware of the tables I create with 
playOrm, just of the column families this framework uses to store the data, 
right?

2012/9/18 Hiller, Dean <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>>
Until Aaron replies, here are my thoughts on the relational piece…

           If everything in my model fits into a relational database, if my 
data is structured, would it still be a good idea to use Cassandra? Why?

The playOrm project explores exactly this issue……A query on 1,000,000 rows in a 
single partition only took 60ms AND you can do joins with it's S-SQL language.  
The answer is a resounding YES, you can put relational data in cassandra.  The 
writes are way faster than a DBMS and joins and SQL can be just as fast and in 
many cases FASTER on noSQL IF you partition your data properly.  A S-SQL 
statement looks like so on playOrm

PARTITIONS t(:partitionId) SELECT t FROM Trades as t where t.numShares > 10

You can have as many partitions as you want and a single partition can have 
millions of rows though I would not exceed 10 million probably.

Later,
Dean

2012/9/18 aaron morton 
<aa...@thelastpickle.com<mailto:aa...@thelastpickle.com><mailto:aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>>>
Also, I saw a presentation which said that if I don't have rows with more than 
a hundred rows in Cassandra, whether I am doing something wrong or I shouldn't 
be using Cassandra.
I do not agree with that statement. (I read that as rows with ore than a 
hundred _columns_)

 *   I need to support a high volume of writes per second. I might have a 
billion writes per hour

Thats about 280K /sec. Netflix did a benchmark that shows 1.1M/sec 
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

 *   I need to write non-structured data that will be processed later by hadoop 
processes to generate structured data from it. Later, I index the structured 
data using SOLR or SOLANDRA, so the data can be consulted by my end user 
application. Is Cassandra recommended for that, or should I be thinking in 
writting directly to HDFS files, for instance? What's the main advantage I get 
from storing data in a nosql service like Cassandra, when compared to storing 
files into HDFS?
 *

You can query your data using Hadoop easily enough. You may want take a look at 
DSE from  http://datastax.com/ it makes using Hadoop and Solr with cassandra 
easier.

 *   If I don't need to perform complicated queries in Cassandra, should I 
store the json-like data just as a column value? I am afraid of doing something 
wrong here, as I would need just to store the json file and some more 5 or 6 
fields to query the files later.
 *

Store the data in the way that best supports the read queries you want to make. 
If you always read all the fields, or it's a canonical record of events storing 
as JSON may be best. If you often get a few fields, and maybe they are updated, 
storing each field as a column value may be best.

 *   Does it make sense to you to use hadoop to process data from Cassandra and 
store the results in a database, like HBase? Once I have structured data, is 
there any reason I should use Cassandra instead of HBase?
 *

It depends on how many moving parts you are comfortable with. Same for the 
questions about HDFS etc. Start with the smallest about of infrastructure.

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/09/2012, at 10:28 AM, Marcelo Elias Del Valle 
<mvall...@gmail.com<mailto:mvall...@gmail.com><mailto:mvall...@gmail.com<mailto:mvall...@gmail.com>>>
 wrote:

Hello,

     I am new to Cassandra and I am in doubt if Cassandra is the right 
technology to use in the architecture I am defining. Also, I saw a presentation 
which said that if I don't have rows with more than a hundred rows in 
Cassandra, whether I am doing something wrong or I shouldn't be using 
Cassandra. Therefore, it might be the case I am doing something wrong. If you 
could help me to find out the answer for these questions by giving any 
feedback, it would be highly appreciated.
     Here is my need and what I am thinking in using Cassandra for:

 *   I need to support a high volume of writes per second. I might have a 
billion writes per hour
 *   I need to write non-structured data that will be processed later by hadoop 
processes to generate structured data from it. Later, I index the structured 
data using SOLR or SOLANDRA, so the data can be consulted by my end user 
application. Is Cassandra recommended for that, or should I be thinking in 
writting directly to HDFS files, for instance? What's the main advantage I get 
from storing data in a nosql service like Cassandra, when compared to storing 
files into HDFS?
 *   Usually I will write json data associated to an ID and my hadoop processes 
will process this data to write data to a database. I have two doubts here:
    *   If I don't need to perform complicated queries in Cassandra, should I 
store the json-like data just as a column value? I am afraid of doing something 
wrong here, as I would need just to store the json file and some more 5 or 6 
fields to query the files later.
    *   Does it make sense to you to use hadoop to process data from Cassandra 
and store the results in a database, like HBase? Once I have structured data, 
is there any reason I should use Cassandra instead of HBase?

     I am sorry if the questions are too dummy, I have been watching a lot of 
videos and reading a lot of documentation about Cassandra, but honestly, more I 
read more I have questions.

Thanks in advance.

Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

Re: Is Cassandra right for me?

Reply via email to