> Actually, if I use community edition for now, I wouldn't be able to use 
> hadoop against data stored in CFS? 
AFAIK DSC is a packaged deployment of Apache Cassandra. You should be ale to 
use Hadoop against it, in the same way you can use hadoop against Apache 
Cassandra. 

You "can do" anything with computers if you have enough time and patience. DSE 
reduces the amount of time and patience needed to run Hadoop over Cassandra. 
Specifically it helps by providing a HDFS and Hive Meta Store that run on 
Cassandra. This reduces the number of moving parts you need to provision. 

> Would writes on HDFS be so quick as in Cassandra?
Yes and no. 
HDFS uses a big bock size, so while it may absorb writes quickly you may not be 
able to read them immediately. 
Remember you may need a HDFS layer for intermediate results. 
 
> would I have advantages in using Cassandra instead of HBase?

Cassandra provides no single point of failure, great scalability, tuneable 
consistency, a flexible data model and very easy single package deployment. My 
HBase knowledge is limited, but I would check those points and go with whatever 
you feel comfortable with. 

> If everything in my model fits into a relational database, if my data is 
> structured, would it still be a good idea to use Cassandra? Why?
It's reasonable to use cassandra for structured data. After a few iterations of 
development you may find that the current structure is not the best for a 
non-RDBMS. e.g. It's often easier to work with larger entities that violate 
Normal Form requirements.

There are lots of advantages to use Cassandra, just as there are benefits to 
using a RDBMS rather than custom flat files. If you feel your project will 
benefit from those advantages, and/or you are technically curious, I would 
recommend  trying Cassandra. 

Chose a small part of your product and create a Proof of Concept, it should 
only take a week or so. Make as many mistakes as you can as fast as you can and 
have fun.   

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/09/2012, at 1:51 AM, Marcelo Elias Del Valle <[email protected]> wrote:

> Aaron,
> 
>     Thank you very much for the answers! Helped me a lot!
>     I would like just a bit more clarification about the points bellow, if 
> you allow me:
> 
> You can query your data using Hadoop easily enough. You may want take a look 
> at DSE from  http://datastax.com/ it makes using Hadoop and Solr with 
> cassandra easier.
> Actually, if I use community edition for now, I wouldn't be able to use 
> hadoop against data stored in CFS? We are considering the enterprise edition 
> here, but the best scenario would be using it just when really needed. Would 
> writes on HDFS be so quick as in Cassandra?
> 
> It depends on how many moving parts you are comfortable with. Same for the 
> questions about HDFS etc. Start with the smallest about of infrastructure.
> Sorry, I didn't really understand this part. I am not sure what you wanted to 
> say, but the question was about using nosql instead a relational database in 
> this case. If learning nosql is not a problem, would I have advantages in 
> using Cassandra instead of HBase? If everything in my model fits into a 
> relational database, if my data is structured, would it still be a good idea 
> to use Cassandra? Why?
> 
> 
> Thanks,
> Marcelo.
> 
> 2012/9/18 aaron morton <[email protected]>
>> Also, I saw a presentation which said that if I don't have rows with more 
>> than a hundred rows in Cassandra, whether I am doing something wrong or I 
>> shouldn't be using Cassandra. 
> I do not agree with that statement. (I read that as rows with ore than a 
> hundred _columns_)
> 
>> I need to support a high volume of writes per second. I might have a billion 
>> writes per hour
> Thats about 280K /sec. Netflix did a benchmark that shows 1.1M/sec 
> http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
> 
>> I need to write non-structured data that will be processed later by hadoop 
>> processes to generate structured data from it. Later, I index the structured 
>> data using SOLR or SOLANDRA, so the data can be consulted by my end user 
>> application. Is Cassandra recommended for that, or should I be thinking in 
>> writting directly to HDFS files, for instance? What's the main advantage I 
>> get from storing data in a nosql service like Cassandra, when compared to 
>> storing files into HDFS?
> You can query your data using Hadoop easily enough. You may want take a look 
> at DSE from  http://datastax.com/ it makes using Hadoop and Solr with 
> cassandra easier. 
> 
>> If I don't need to perform complicated queries in Cassandra, should I store 
>> the json-like data just as a column value? I am afraid of doing something 
>> wrong here, as I would need just to store the json file and some more 5 or 6 
>> fields to query the files later.
> Store the data in the way that best supports the read queries you want to 
> make. If you always read all the fields, or it's a canonical record of events 
> storing as JSON may be best. If you often get a few fields, and maybe they 
> are updated, storing each field as a column value may be best. 
> 
>> Does it make sense to you to use hadoop to process data from Cassandra and 
>> store the results in a database, like HBase? Once I have structured data, is 
>> there any reason I should use Cassandra instead of HBase?
> It depends on how many moving parts you are comfortable with. Same for the 
> questions about HDFS etc. Start with the smallest about of infrastructure. 
> 
> Hope that helps. 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/09/2012, at 10:28 AM, Marcelo Elias Del Valle <[email protected]> 
> wrote:
> 
>> Hello,
>> 
>>      I am new to Cassandra and I am in doubt if Cassandra is the right 
>> technology to use in the architecture I am defining. Also, I saw a 
>> presentation which said that if I don't have rows with more than a hundred 
>> rows in Cassandra, whether I am doing something wrong or I shouldn't be 
>> using Cassandra. Therefore, it might be the case I am doing something wrong. 
>> If you could help me to find out the answer for these questions by giving 
>> any feedback, it would be highly appreciated. 
>>      Here is my need and what I am thinking in using Cassandra for:
>> I need to support a high volume of writes per second. I might have a billion 
>> writes per hour
>> I need to write non-structured data that will be processed later by hadoop 
>> processes to generate structured data from it. Later, I index the structured 
>> data using SOLR or SOLANDRA, so the data can be consulted by my end user 
>> application. Is Cassandra recommended for that, or should I be thinking in 
>> writting directly to HDFS files, for instance? What's the main advantage I 
>> get from storing data in a nosql service like Cassandra, when compared to 
>> storing files into HDFS?
>> Usually I will write json data associated to an ID and my hadoop processes 
>> will process this data to write data to a database. I have two doubts here:
>> If I don't need to perform complicated queries in Cassandra, should I store 
>> the json-like data just as a column value? I am afraid of doing something 
>> wrong here, as I would need just to store the json file and some more 5 or 6 
>> fields to query the files later.
>> Does it make sense to you to use hadoop to process data from Cassandra and 
>> store the results in a database, like HBase? Once I have structured data, is 
>> there any reason I should use Cassandra instead of HBase?
>>      I am sorry if the questions are too dummy, I have been watching a lot 
>> of videos and reading a lot of documentation about Cassandra, but honestly, 
>> more I read more I have questions. 
>> 
>> Thanks in advance.
>> 
>> Best regards,
>> -- 
>> Marcelo Elias Del Valle
>> http://mvalle.com - @mvallebr
> 
> 
> 
> 
> -- 
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr

Reply via email to