Best practise is to use a dedicated DC for analytics separated from the hot DC.
Le jeu. 12 avr. 2018 à 15:45, sha p <shatestt...@gmail.com> a écrit : > Got it. > Thank you so for your detailed explanation. > > Regards, > Shyam > > On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5015...@gmail.com> wrote: > >> Cassandra tends to be used in a lot of web applications. It’s loads are >> more natural and evenly distributed. Like people logging on throughout the >> day. And people operating it tend to be latency sensitive. >> >> Spark on the other hand will try and complete it’s tasks as quickly as >> possible. This might mean bulk reading from the Cassandra at 10 times the >> usual operations load, but for only say 5 minutes every half hour (however >> long it takes to read in the data for a job and whenever that job is run). >> In this case during that 5 minutes your normal operations work (customers) >> are going to experience a lot of latency. >> >> This even happens with streaming jobs, every time spark goes to interact >> with Cassandra it does so very quickly, hammers it for reads and then does >> it’s own stuff until it needs to write things out. This might equate to >> intermittent latency spikes. >> >> In theory, you can throttle your reads and writes but I don’t know much >> about this and don’t see people actually doing it. >> >> Regards, >> Evelyn. >> >> On 12 Apr 2018, at 4:30 pm, sha p <shatestt...@gmail.com> wrote: >> >> Evelyn, >> Can you please elaborate on below >> Spark is notorious for causing latency spikes in Cassandra which is not >> great if you are are sensitive to that. >> >> >> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5015...@gmail.com> wrote: >> >>> Are you building a search engine -> Solr >>> Are you building an analytics function -> Spark >>> >>> I feel they are used in significantly different use cases, what are you >>> trying to build? >>> >>> If it’s an analytics functionality that’s seperate from your operations >>> functionality I’d build it in it’s own DC. Spark is notorious for causing >>> latency spikes in Cassandra which is not great if you are are sensitive to >>> that. >>> >>> Regards, >>> Evelyn. >>> >>> On 12 Apr 2018, at 6:55 am, kooljava2 <koolja...@yahoo.com.INVALID> >>> wrote: >>> >>> Hello, >>> >>> We are exploring on configuring Sorl/Spark. Wanted to get input on this. >>> 1) How do we decide which one to use? >>> 2) Do we run this on a DC where there is less workload? >>> >>> Any other suggestion or comments are appreciated. >>> >>> Thank you. >>> >>> >>> >>