No, the question isnt closed. You dont get to decide that. I dont run a website making claims regarding cassandra and spark - your employer does.
Again, where are your benchmarks? I will publish mine, then we'll see what you've got. -- Colin Clark +1 612 859 6129 Skype colin.p.clark > On Feb 11, 2015, at 8:39 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > > For your information Colin: http://en.wikipedia.org/wiki/List_of_fallacies. > Look at "Burden of proof" > > You stated "The very nature of cassandra's distributed nature vs partitioning > data on hadoop makes spark on hdfs actually fasted than on cassandra...." > > It's up to YOU to prove it right, not up to me to prove it wrong. > > All other bla bla is troll. > > Come back to me once you get some decent benchmarks supporting your > statement, until then, the question is closed. > > > >> On Wed, Feb 11, 2015 at 3:17 PM, Colin <co...@clark.ws> wrote: >> Did you want me to included specific examples from my employment at datastax >> or start from the ground up? >> >> All spark is on cassandra is a better than the previous use of hive. >> >> The fact that datastax hasnt provided any benchmarks themselves other than >> glossy marketing statements pretty much says it all-where are your >> benchmarks? Maybe you could combine it with the in memory option to really >> boogie... >> >> :) >> >> (If I find time, I might just write a blog post about exactly how to do >> this-it involves the use of parquet and partitioning with clustering-and it >> doesnt cost anything to do it-and it's in production at my company) >> -- >> Colin Clark >> +1 612 859 6129 >> Skype colin.p.clark >> >>> On Feb 11, 2015, at 6:51 AM, DuyHai Doan <doanduy...@gmail.com> wrote: >>> >>> "The very nature of cassandra's distributed nature vs partitioning data on >>> hadoop makes spark on hdfs actually fasted than on cassandra...." >>> >>> Prove it. Did you ever have a look into the source code of the >>> Spark/Cassandra connector to see how data locality is achieved before >>> throwing out such statement ? >>> >>>> On Wed, Feb 11, 2015 at 12:42 PM, Marcelo Valle (BLOOMBERG/ LONDON) >>>> <mvallemil...@bloomberg.net> wrote: >>>> > cassandra makes a very poor datawarehouse ot long term time series store >>>> >>>> Really? This is not the impression I have... I think Cassandra is good to >>>> store larges amounts of data and historical information, it's only not >>>> good to store temporary data. >>>> Netflix has a large amount of data and it's all stored in Cassandra, >>>> AFAIK. >>>> >>>> > The very nature of cassandra's distributed nature vs partitioning data >>>> > on hadoop makes spark on hdfs actually fasted than on cassandra. >>>> >>>> I am not sure about the current state of Spark support for Cassandra, but >>>> I guess if you create a map reduce job, the intermediate map results will >>>> be still stored in HDFS, as it happens to hadoop, is this right? I think >>>> the problem with Spark + Cassandra or with Hadoop + Cassandra is that the >>>> hard part spark or hadoop does, the shuffling, could be done out of the >>>> box with Cassandra, but no one takes advantage on that. What if a map / >>>> reduce job used a temporary CF in Cassandra to store intermediate results? >>>> >>>> From: user@cassandra.apache.org >>>> Subject: Re: How to speed up SELECT * query in Cassandra >>>> I use spark with cassandra, and you dont need DSE. >>>> >>>> I see a lot of people ask this same question below (how do I get a lot of >>>> data out of cassandra?), and my question is always, why arent you updating >>>> both places at once? >>>> >>>> For example, we use hadoop and cassandra in conjunction with each other, >>>> we use a message bus to store every event in both, aggregrate in both, but >>>> only keep current data in cassandra (cassandra makes a very poor >>>> datawarehouse ot long term time series store) and then use services to >>>> process queries that merge data from hadoop and cassandra. >>>> >>>> Also, spark on hdfs gives more flexibility in terms of large datasets and >>>> performance. The very nature of cassandra's distributed nature vs >>>> partitioning data on hadoop makes spark on hdfs actually fasted than on >>>> cassandra.... >>>> >>>> >>>> >>>> -- >>>> Colin Clark >>>> +1 612 859 6129 >>>> Skype colin.p.clark >>>> >>>>> On Feb 11, 2015, at 4:49 AM, Jens Rantil <jens.ran...@tink.se> wrote: >>>>> >>>>> >>>>>> On Wed, Feb 11, 2015 at 11:40 AM, Marcelo Valle (BLOOMBERG/ LONDON) >>>>>> <mvallemil...@bloomberg.net> wrote: >>>>>> If you use Cassandra enterprise, you can use hive, AFAIK. >>>>> >>>>> Even better, you can use Spark/Shark with DSE. >>>>> >>>>> Cheers, >>>>> Jens >>>>> >>>>> >>>>> -- >>>>> Jens Rantil >>>>> Backend engineer >>>>> Tink AB >>>>> >>>>> Email: jens.ran...@tink.se >>>>> Phone: +46 708 84 18 32 >>>>> Web: www.tink.se >>>>> >>>>> Facebook Linkedin Twitter >