Folks this is the user list for Apache Cassandra. I would suggest
redirecting the question to Datastax the commercial entity that produces
the software.

On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:

> Best practise is to use a dedicated DC for analytics separated from the
> hot DC.
>
> Le jeu. 12 avr. 2018 à 15:45, sha p <shatestt...@gmail.com> a écrit :
>
>> Got it.
>> Thank you so for your detailed explanation.
>>
>> Regards,
>> Shyam
>>
>> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5015...@gmail.com> wrote:
>>
>>> Cassandra tends to be used in a lot of web applications. It’s loads are
>>> more natural and evenly distributed. Like people logging on throughout the
>>> day. And people operating it tend to be latency sensitive.
>>>
>>> Spark on the other hand will try and complete it’s tasks as quickly as
>>> possible. This might mean bulk reading from the Cassandra at 10 times the
>>> usual operations load, but for only say 5 minutes every half hour (however
>>> long it takes to read in the data for a job and whenever that job is run).
>>> In this case during that 5 minutes your normal operations work (customers)
>>> are going to experience a lot of latency.
>>>
>>> This even happens with streaming jobs, every time spark goes to interact
>>> with Cassandra it does so very quickly, hammers it for reads and then does
>>> it’s own stuff until it needs to write things out. This might equate to
>>> intermittent latency spikes.
>>>
>>> In theory, you can throttle your reads and writes but I don’t know much
>>> about this and don’t see people actually doing it.
>>>
>>> Regards,
>>> Evelyn.
>>>
>>> On 12 Apr 2018, at 4:30 pm, sha p <shatestt...@gmail.com> wrote:
>>>
>>> Evelyn,
>>> Can you please elaborate on below
>>> Spark is notorious for causing latency spikes in Cassandra which is not
>>> great if you are are sensitive to that.
>>>
>>>
>>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5015...@gmail.com> wrote:
>>>
>>>> Are you building a search engine -> Solr
>>>> Are you building an analytics function -> Spark
>>>>
>>>> I feel they are used in significantly different use cases, what are you
>>>> trying to build?
>>>>
>>>> If it’s an analytics functionality that’s seperate from your operations
>>>> functionality I’d build it in it’s own DC. Spark is notorious for causing
>>>> latency spikes in Cassandra which is not great if you are are sensitive to
>>>> that.
>>>>
>>>> Regards,
>>>> Evelyn.
>>>>
>>>> On 12 Apr 2018, at 6:55 am, kooljava2 <koolja...@yahoo.com.INVALID>
>>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> We are exploring on configuring Sorl/Spark. Wanted to get input on
>>>> this.
>>>> 1) How do we decide which one to use?
>>>> 2) Do we run this on a DC where there is less workload?
>>>>
>>>> Any other suggestion or comments are appreciated.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer

Reply via email to