Folks this is the user list for Apache Cassandra. I would suggest redirecting the question to Datastax the commercial entity that produces the software.
On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > Best practise is to use a dedicated DC for analytics separated from the > hot DC. > > Le jeu. 12 avr. 2018 à 15:45, sha p <shatestt...@gmail.com> a écrit : > >> Got it. >> Thank you so for your detailed explanation. >> >> Regards, >> Shyam >> >> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5015...@gmail.com> wrote: >> >>> Cassandra tends to be used in a lot of web applications. It’s loads are >>> more natural and evenly distributed. Like people logging on throughout the >>> day. And people operating it tend to be latency sensitive. >>> >>> Spark on the other hand will try and complete it’s tasks as quickly as >>> possible. This might mean bulk reading from the Cassandra at 10 times the >>> usual operations load, but for only say 5 minutes every half hour (however >>> long it takes to read in the data for a job and whenever that job is run). >>> In this case during that 5 minutes your normal operations work (customers) >>> are going to experience a lot of latency. >>> >>> This even happens with streaming jobs, every time spark goes to interact >>> with Cassandra it does so very quickly, hammers it for reads and then does >>> it’s own stuff until it needs to write things out. This might equate to >>> intermittent latency spikes. >>> >>> In theory, you can throttle your reads and writes but I don’t know much >>> about this and don’t see people actually doing it. >>> >>> Regards, >>> Evelyn. >>> >>> On 12 Apr 2018, at 4:30 pm, sha p <shatestt...@gmail.com> wrote: >>> >>> Evelyn, >>> Can you please elaborate on below >>> Spark is notorious for causing latency spikes in Cassandra which is not >>> great if you are are sensitive to that. >>> >>> >>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5015...@gmail.com> wrote: >>> >>>> Are you building a search engine -> Solr >>>> Are you building an analytics function -> Spark >>>> >>>> I feel they are used in significantly different use cases, what are you >>>> trying to build? >>>> >>>> If it’s an analytics functionality that’s seperate from your operations >>>> functionality I’d build it in it’s own DC. Spark is notorious for causing >>>> latency spikes in Cassandra which is not great if you are are sensitive to >>>> that. >>>> >>>> Regards, >>>> Evelyn. >>>> >>>> On 12 Apr 2018, at 6:55 am, kooljava2 <koolja...@yahoo.com.INVALID> >>>> wrote: >>>> >>>> Hello, >>>> >>>> We are exploring on configuring Sorl/Spark. Wanted to get input on >>>> this. >>>> 1) How do we decide which one to use? >>>> 2) Do we run this on a DC where there is less workload? >>>> >>>> Any other suggestion or comments are appreciated. >>>> >>>> Thank you. >>>> >>>> >>>> >>> -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Reliability at Scale Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer