Best practise is to use a dedicated DC for analytics separated from the hot
DC.

Le jeu. 12 avr. 2018 à 15:45, sha p <shatestt...@gmail.com> a écrit :

> Got it.
> Thank you so for your detailed explanation.
>
> Regards,
> Shyam
>
> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5015...@gmail.com> wrote:
>
>> Cassandra tends to be used in a lot of web applications. It’s loads are
>> more natural and evenly distributed. Like people logging on throughout the
>> day. And people operating it tend to be latency sensitive.
>>
>> Spark on the other hand will try and complete it’s tasks as quickly as
>> possible. This might mean bulk reading from the Cassandra at 10 times the
>> usual operations load, but for only say 5 minutes every half hour (however
>> long it takes to read in the data for a job and whenever that job is run).
>> In this case during that 5 minutes your normal operations work (customers)
>> are going to experience a lot of latency.
>>
>> This even happens with streaming jobs, every time spark goes to interact
>> with Cassandra it does so very quickly, hammers it for reads and then does
>> it’s own stuff until it needs to write things out. This might equate to
>> intermittent latency spikes.
>>
>> In theory, you can throttle your reads and writes but I don’t know much
>> about this and don’t see people actually doing it.
>>
>> Regards,
>> Evelyn.
>>
>> On 12 Apr 2018, at 4:30 pm, sha p <shatestt...@gmail.com> wrote:
>>
>> Evelyn,
>> Can you please elaborate on below
>> Spark is notorious for causing latency spikes in Cassandra which is not
>> great if you are are sensitive to that.
>>
>>
>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5015...@gmail.com> wrote:
>>
>>> Are you building a search engine -> Solr
>>> Are you building an analytics function -> Spark
>>>
>>> I feel they are used in significantly different use cases, what are you
>>> trying to build?
>>>
>>> If it’s an analytics functionality that’s seperate from your operations
>>> functionality I’d build it in it’s own DC. Spark is notorious for causing
>>> latency spikes in Cassandra which is not great if you are are sensitive to
>>> that.
>>>
>>> Regards,
>>> Evelyn.
>>>
>>> On 12 Apr 2018, at 6:55 am, kooljava2 <koolja...@yahoo.com.INVALID>
>>> wrote:
>>>
>>> Hello,
>>>
>>> We are exploring on configuring Sorl/Spark. Wanted to get input on this.
>>> 1) How do we decide which one to use?
>>> 2) Do we run this on a DC where there is less workload?
>>>
>>> Any other suggestion or comments are appreciated.
>>>
>>> Thank you.
>>>
>>>
>>>
>>

Reply via email to