Sorry for the delay in responding here - yes, we can add some diagrams to the CEP - I’ll try to get that done by end-of-week.
Thanks, Doug > On Mar 28, 2023, at 1:14 PM, J. D. Jordan <jeremiah.jor...@gmail.com> wrote: > > Maybe some data flow diagrams could be added to the cep showing some example > operations for read/write? > >> On Mar 28, 2023, at 11:35 AM, Yifan Cai <yc25c...@gmail.com> wrote: >> >> >> A lot of great discussions! >> >> On the sidecar front, especially what the role sidecar plays in terms of >> this CEP, I feel there might be some confusion. Once the code is published, >> we should have clarity. >> Sidecar does not read sstables nor do any coordination for analytics >> queries. It is local to the companion Cassandra instance. For bulk read, it >> takes snapshots and streams sstables to spark workers to read. For bulk >> write, it imports the sstables uploaded from spark workers. All commands are >> existing jmx/nodetool functionalities from Cassandra. Sidecar adds the http >> interface to them. It might be an over simplified description. The complex >> computation is performed in spark clusters only. >> >> In the long run, Cassandra might evolve into a database that does both OLTP >> and OLAP. (Not what this thread aims for) >> At the current stage, Spark is very suited for analytic purposes. >> >> On Tue, Mar 28, 2023 at 9:06 AM Benedict <bened...@apache.org >> <mailto:bened...@apache.org>> wrote: >>> I disagree with the first claim, as the process has all the information it >>> chooses to utilise about which resources it’s using and what it’s using >>> those resources for. >>> >>> The inability to isolate GC domains is something we cannot address, but >>> also probably not a problem if we were doing everything with memory >>> management as well as we could be. >>> >>> But, not worth detailing this thread for. Today we do very little well on >>> this front within the process, and a separate process is well justified >>> given the state of play. >>> >>>> On 28 Mar 2023, at 16:38, Derek Chen-Becker <de...@chen-becker.org >>>> <mailto:de...@chen-becker.org>> wrote: >>>> >>>> >>>> >>>> On Tue, Mar 28, 2023 at 9:03 AM Joseph Lynch <joe.e.ly...@gmail.com >>>> <mailto:joe.e.ly...@gmail.com>> wrote: >>>> ... >>>> >>>>> I think we might be underselling how valuable JVM isolation is, >>>>> especially for analytics queries that are going to pass the entire >>>>> dataset through heap somewhat constantly. >>>> >>>> Big +1 here. The JVM simply does not have significant granularity of >>>> control for resource utilization, but this is explicitly a feature of >>>> separate processes. Add in being able to separate GC domains and you can >>>> avoid a lot of noisy neighbor in-VM behavior for the disparate workloads. >>>> >>>> Cheers, >>>> >>>> Derek >>>> >>>> >>>> -- >>>> +---------------------------------------------------------------+ >>>> | Derek Chen-Becker | >>>> | GPG Key available at https://keybase.io/dchenbecker and | >>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | >>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | >>>> +---------------------------------------------------------------+ >>>>