"from a particular query" should be " from a particular country"
On Sun, Oct 23, 2016 at 2:36 PM, Ali Akhtar <ali.rac...@gmail.com> wrote: > They can be, but I would assume that if your Cassandra data model is > inefficient for the kind of queries you want to do, Spark won't magically > take that way. > > For example, say you have a users table. Each user has a country, which > isn't a partitioning key or clustering key. > > If you wanted to calculate the number of all users from a particular > query, there's no way to do that in the previous data model other than to > do a full table scan and count the users from that country. > > Spark can do this full table scan for you and return the number of > records. May be it can spread the work across multiple servers. But it > can't reduce the amount of work that has to be done. > > Otoh, if you were okay with creating a new table in which the country is > part of the primary key, and for each user that signed up, you created a > record in this user_by_country table, then it would be a very fast query to > look up the users in a particular country, as country is then the primary > key. > > > > On Sun, Oct 23, 2016 at 2:18 PM, Welly Tambunan <if05...@gmail.com> wrote: > >> I like muti data centre resillience in cassandra. >> >> I think thats plus one for cassandra. >> >> Ali, complex analytics can be done in spark right? >> >> On 23 Oct 2016 4:08 p.m., "Ali Akhtar" <ali.rac...@gmail.com> wrote: >> >> > >> >> > I would say it depends on your use case. >> > >> > If you need a lot of queries that require joins, or complex analytics >> of the kind that Cassandra isn't suited for, then HDFS / HBase may be >> better. >> > >> > If you can work with the cassandra way of doing things (creating new >> tables for each query you'll need to do, duplicating data - doing extra >> writes for faster reads) , then Cassandra should work for you. It is easier >> to setup and do dev ops with, in my experience. >> > >> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan <if05...@gmail.com> >> wrote: >> >> >> >> >> >> I mean. HDFS and HBase. >> >> >> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar <ali.rac...@gmail.com> >> wrote: >> >> >>> >> >> >>> By Hadoop do you mean HDFS? >> >>> >> >>> >> >>> >> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan <if05...@gmail.com> >> wrote: >> >> >>>> >> >> >>>> Hi All, >> >>>> >> >>>> I read the following comparison between hadoop and cassandra. Seems >> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra >> for hot data (real time data). >> >>>> >> >>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop >> <http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop> >> >>>> >> >>>> My question is, can we just use cassandra to rule them all ? >> >>>> >> >>>> What we are trying to achieve is to minimize the moving part on our >> system. >> >>>> >> >>>> Any response would be really appreciated. >> >>>> >> >>>> >> >>>> Cheers >> >>>> >> >>>> -- >> >>>> Welly Tambunan >> >>>> Triplelands >> >>>> >> >>>> http://weltam.wordpress.com <http://weltam.wordpress.com> >> >>>> http://www.triplelands.com <http://www.triplelands.com/blog/> >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Welly Tambunan >> >> Triplelands >> >> >> >> http://weltam.wordpress.com <http://weltam.wordpress.com> >> >> http://www.triplelands.com <http://www.triplelands.com/blog/> >> > >> > >> > >