Thanks Jeff. On Thu, Apr 12, 2018, 21:37 Jeff Jirsa <jji...@gmail.com> wrote:
> Pretty sure Ben meant that datastax produces DSE, not Cassandra, and since > the questions specifically mentions DSE in the subject (implying that the > user is going to be running either solr or spark within DSE to talk to > cassandra), Ben’s recommendation seems quite reasonable to me. > > > > -- > Jeff Jirsa > > > On Apr 12, 2018, at 6:23 PM, Niclas Hedhman <nic...@apache.org> wrote: > > Ben, > > 1. I don't see anything in this thread that is DSE specific, so I think it > belongs here. > > 2. Careful when you say that Datastax produces Cassandra. Cassandra is a > product of Apache Software Foundation, and no one else. You, Ben, should be > very well aware of this, to avoid further trademark issues between Datastax > and ASF. > > Cheers > Niclas Hedhman > Member of ASF > > On Thu, Apr 12, 2018 at 9:57 PM, Ben Bromhead <b...@instaclustr.com> wrote: > >> Folks this is the user list for Apache Cassandra. I would suggest >> redirecting the question to Datastax the commercial entity that produces >> the software. >> >> On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski < >> vincent.gromakow...@gmail.com> wrote: >> >>> Best practise is to use a dedicated DC for analytics separated from the >>> hot DC. >>> >>> Le jeu. 12 avr. 2018 à 15:45, sha p <shatestt...@gmail.com> a écrit : >>> >>>> Got it. >>>> Thank you so for your detailed explanation. >>>> >>>> Regards, >>>> Shyam >>>> >>>> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5015...@gmail.com> wrote: >>>> >>>>> Cassandra tends to be used in a lot of web applications. It’s loads >>>>> are more natural and evenly distributed. Like people logging on throughout >>>>> the day. And people operating it tend to be latency sensitive. >>>>> >>>>> Spark on the other hand will try and complete it’s tasks as quickly as >>>>> possible. This might mean bulk reading from the Cassandra at 10 times the >>>>> usual operations load, but for only say 5 minutes every half hour (however >>>>> long it takes to read in the data for a job and whenever that job is run). >>>>> In this case during that 5 minutes your normal operations work (customers) >>>>> are going to experience a lot of latency. >>>>> >>>>> This even happens with streaming jobs, every time spark goes to >>>>> interact with Cassandra it does so very quickly, hammers it for reads and >>>>> then does it’s own stuff until it needs to write things out. This might >>>>> equate to intermittent latency spikes. >>>>> >>>>> In theory, you can throttle your reads and writes but I don’t know >>>>> much about this and don’t see people actually doing it. >>>>> >>>>> Regards, >>>>> Evelyn. >>>>> >>>>> On 12 Apr 2018, at 4:30 pm, sha p <shatestt...@gmail.com> wrote: >>>>> >>>>> Evelyn, >>>>> Can you please elaborate on below >>>>> Spark is notorious for causing latency spikes in Cassandra which is >>>>> not great if you are are sensitive to that. >>>>> >>>>> >>>>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5015...@gmail.com> wrote: >>>>> >>>>>> Are you building a search engine -> Solr >>>>>> Are you building an analytics function -> Spark >>>>>> >>>>>> I feel they are used in significantly different use cases, what are >>>>>> you trying to build? >>>>>> >>>>>> If it’s an analytics functionality that’s seperate from your >>>>>> operations functionality I’d build it in it’s own DC. Spark is notorious >>>>>> for causing latency spikes in Cassandra which is not great if you are are >>>>>> sensitive to that. >>>>>> >>>>>> Regards, >>>>>> Evelyn. >>>>>> >>>>>> On 12 Apr 2018, at 6:55 am, kooljava2 <koolja...@yahoo.com.INVALID> >>>>>> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> We are exploring on configuring Sorl/Spark. Wanted to get input on >>>>>> this. >>>>>> 1) How do we decide which one to use? >>>>>> 2) Do we run this on a DC where there is less workload? >>>>>> >>>>>> Any other suggestion or comments are appreciated. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> >>>>>> >>>>> -- >> Ben Bromhead >> CTO | Instaclustr <https://www.instaclustr.com/> >> +1 650 284 9692 >> Reliability at Scale >> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer >> > > > > -- > Niclas Hedhman, Software Developer > http://zest.apache.org - New Energy for Java > > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Reliability at Scale Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer