Re: Sorl/DSE Spark

Ben Bromhead Fri, 13 Apr 2018 04:42:39 -0700

Thanks Jeff.

On Thu, Apr 12, 2018, 21:37 Jeff Jirsa <jji...@gmail.com> wrote:


> Pretty sure Ben meant that datastax produces DSE, not Cassandra, and since
> the questions specifically mentions DSE in the subject (implying that the
> user is going to be running either solr or spark within DSE to talk to
> cassandra), Ben’s recommendation seems quite reasonable to me.
>
>
>
> --
> Jeff Jirsa
>
>
> On Apr 12, 2018, at 6:23 PM, Niclas Hedhman <nic...@apache.org> wrote:
>
> Ben,
>
> 1. I don't see anything in this thread that is DSE specific, so I think it
> belongs here.
>
> 2. Careful when you say that Datastax produces Cassandra. Cassandra is a
> product of Apache Software Foundation, and no one else. You, Ben, should be
> very well aware of this, to avoid further trademark issues between Datastax
> and ASF.
>
> Cheers
> Niclas Hedhman
> Member of ASF
>
> On Thu, Apr 12, 2018 at 9:57 PM, Ben Bromhead <b...@instaclustr.com> wrote:
>
>> Folks this is the user list for Apache Cassandra. I would suggest
>> redirecting the question to Datastax the commercial entity that produces
>> the software.
>>
>> On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski <
>> vincent.gromakow...@gmail.com> wrote:
>>
>>> Best practise is to use a dedicated DC for analytics separated from the
>>> hot DC.
>>>
>>> Le jeu. 12 avr. 2018 à 15:45, sha p <shatestt...@gmail.com> a écrit :
>>>
>>>> Got it.
>>>> Thank you so for your detailed explanation.
>>>>
>>>> Regards,
>>>> Shyam
>>>>
>>>> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5015...@gmail.com> wrote:
>>>>
>>>>> Cassandra tends to be used in a lot of web applications. It’s loads
>>>>> are more natural and evenly distributed. Like people logging on throughout
>>>>> the day. And people operating it tend to be latency sensitive.
>>>>>
>>>>> Spark on the other hand will try and complete it’s tasks as quickly as
>>>>> possible. This might mean bulk reading from the Cassandra at 10 times the
>>>>> usual operations load, but for only say 5 minutes every half hour (however
>>>>> long it takes to read in the data for a job and whenever that job is run).
>>>>> In this case during that 5 minutes your normal operations work (customers)
>>>>> are going to experience a lot of latency.
>>>>>
>>>>> This even happens with streaming jobs, every time spark goes to
>>>>> interact with Cassandra it does so very quickly, hammers it for reads and
>>>>> then does it’s own stuff until it needs to write things out. This might
>>>>> equate to intermittent latency spikes.
>>>>>
>>>>> In theory, you can throttle your reads and writes but I don’t know
>>>>> much about this and don’t see people actually doing it.
>>>>>
>>>>> Regards,
>>>>> Evelyn.
>>>>>
>>>>> On 12 Apr 2018, at 4:30 pm, sha p <shatestt...@gmail.com> wrote:
>>>>>
>>>>> Evelyn,
>>>>> Can you please elaborate on below
>>>>> Spark is notorious for causing latency spikes in Cassandra which is
>>>>> not great if you are are sensitive to that.
>>>>>
>>>>>
>>>>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5015...@gmail.com> wrote:
>>>>>
>>>>>> Are you building a search engine -> Solr
>>>>>> Are you building an analytics function -> Spark
>>>>>>
>>>>>> I feel they are used in significantly different use cases, what are
>>>>>> you trying to build?
>>>>>>
>>>>>> If it’s an analytics functionality that’s seperate from your
>>>>>> operations functionality I’d build it in it’s own DC. Spark is notorious
>>>>>> for causing latency spikes in Cassandra which is not great if you are are
>>>>>> sensitive to that.
>>>>>>
>>>>>> Regards,
>>>>>> Evelyn.
>>>>>>
>>>>>> On 12 Apr 2018, at 6:55 am, kooljava2 <koolja...@yahoo.com.INVALID>
>>>>>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We are exploring on configuring Sorl/Spark. Wanted to get input on
>>>>>> this.
>>>>>> 1) How do we decide which one to use?
>>>>>> 2) Do we run this on a DC where there is less workload?
>>>>>>
>>>>>> Any other suggestion or comments are appreciated.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>> Ben Bromhead
>> CTO | Instaclustr <https://www.instaclustr.com/>
>> +1 650 284 9692
>> Reliability at Scale
>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>>
>
>
>
> --
> Niclas Hedhman, Software Developer
> http://zest.apache.org - New Energy for Java
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer

Re: Sorl/DSE Spark

Reply via email to