Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

Benedict Tue, 28 Mar 2023 08:17:20 -0700

Fwiw I’m sceptical of the performance angle long term. You can do a lot more to 
control QoS when you understand what each query is doing, and what your SLOs 
are. You can also more efficiently apportion your resources (not leaving any 
lying fallow to ensure it’s free later)

But, we’re a long way from that.

My personal view of the sidecar is to offer these sorts of facilities more 
rapidly than we might in Cassandra proper, but that we might eventually (when 
mature enough and Cassandra is ready for it) bring them in process.

Certainly, managing consistency (repair etc) and serving bulk operations should 
*long term* live in Cassandra IMO.

But that isn’t the state of the world today, so I support a separate process.

Though, I am nervous about the issues Jeremiah raises - we need to ensure we 
are not tightly coupling things and creating new problems. Managing other 
processes reliably and promptly seeing sstable changes and memtable flushes 
isn’t something that would be pretty, and we should probably offer weak 
guarantees about what’s visible when - ideally the sidecar would rely on file 
system watch notifications, or perhaps at most some fsync like functionality 
for flushing memtables.

> On 28 Mar 2023, at 16:09, Joseph Lynch <joe.e.ly...@gmail.com> wrote:
> 
> 
>> 
>> If we want to bring groups/containers/etc into the default deployment 
>> mechanisms of C*, great.  I am all for dividing it up into micro services 
>> given we solve all the problems I listed in the complexity section.
>> 
>> I am actually all for dividing C* up into multiple micro services, but the 
>> project needs to buy in to containers as the default mechanism for running 
>> it for that to be viable in my mind.
> 
> I was under the impression that with CEP-1 the project did buy into
> the direction of moving the workloads that are non-latency sensitive
> out of the main process? At the time of the discussion folks mentioned
> repair, bulk workloads, backup, restore, compaction etc ... as all
> possible things we would like to extract over time to the sidecar.
> 
> I don't think we want to go full on micro services, with like 12
> processes all handling one thing, but 2 seems like a good step? One
> for latency sensitive requests (reads/writes - the current process),
> and one for non latency sensitive requests (control plane, bulk work,
> etc ... - the sidecar).
> 
> -Joey

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

Reply via email to