Re: Hadoop vs Cassandra

Ali Akhtar Sun, 23 Oct 2016 02:38:07 -0700

"from a particular query" should be " from a particular country"


On Sun, Oct 23, 2016 at 2:36 PM, Ali Akhtar <ali.rac...@gmail.com> wrote:

> They can be, but I would assume that if your Cassandra data model is
> inefficient for the kind of queries you want to do, Spark won't magically
> take that way.
>
> For example, say you have a users table. Each user has a country, which
> isn't a partitioning key or clustering key.
>
> If you wanted to calculate the number of all users from a particular
> query, there's no way to do that in the previous data model other than to
> do a full table scan and count the users from that country.
>
> Spark can do this full table scan for you and return the number of
> records. May be it can spread the work across multiple servers. But it
> can't reduce the amount of work that has to be done.
>
> Otoh, if you were okay with creating a new table in which the country is
> part of the primary key, and for each user that signed up, you created a
> record in this user_by_country table, then it would be a very fast query to
> look up the users in a particular country, as country is then the primary
> key.
>
>
>
> On Sun, Oct 23, 2016 at 2:18 PM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> I like muti data centre resillience in cassandra.
>>
>> I think thats plus one for cassandra.
>>
>> Ali, complex analytics can be done in spark right?
>>
>> On 23 Oct 2016 4:08 p.m., "Ali Akhtar" <ali.rac...@gmail.com> wrote:
>>
>> >
>>
>> > I would say it depends on your use case.
>> >
>> > If you need a lot of queries that require joins, or complex analytics
>> of the kind that Cassandra isn't suited for, then HDFS / HBase may be
>> better.
>> >
>> > If you can work with the cassandra way of doing things (creating new
>> tables for each query you'll need to do, duplicating data - doing extra
>> writes for faster reads) , then Cassandra should work for you. It is easier
>> to setup and do dev ops with, in my experience.
>> >
>> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan <if05...@gmail.com>
>> wrote:
>>
>> >>
>>
>> >> I mean. HDFS and HBase.
>> >>
>> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar <ali.rac...@gmail.com>
>> wrote:
>>
>> >>>
>>
>> >>> By Hadoop do you mean HDFS?
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan <if05...@gmail.com>
>> wrote:
>>
>> >>>>
>>
>> >>>> Hi All,
>> >>>>
>> >>>> I read the following comparison between hadoop and cassandra. Seems
>> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
>> for hot data (real time data).
>> >>>>
>> >>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
>> <http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop>
>> >>>>
>> >>>> My question is, can we just use cassandra to rule them all ?
>> >>>>
>> >>>> What we are trying to achieve is to minimize the moving part on our
>> system.
>> >>>>
>> >>>> Any response would be really appreciated.
>> >>>>
>> >>>>
>> >>>> Cheers
>> >>>>
>> >>>> --
>> >>>> Welly Tambunan
>> >>>> Triplelands
>> >>>>
>> >>>> http://weltam.wordpress.com <http://weltam.wordpress.com>
>> >>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Welly Tambunan
>> >> Triplelands
>> >>
>> >> http://weltam.wordpress.com <http://weltam.wordpress.com>
>> >> http://www.triplelands.com <http://www.triplelands.com/blog/>
>> >
>> >
>>
>
>

Re: Hadoop vs Cassandra

Reply via email to