Thanks Dinesh ,
That will be great.👍
Dinesh Joshi 于2023年5月4日 周四下午11:06写道:
> Hi Guo,
>
> I would expect that there would be release artifacts for the sidecar as
> well as the library once this functionality is available.
>
> Dinesh
>
> On May 4, 2023, at 12:03 AM, guo Maxwell wrote:
>
> This is a
Hi Guo,
I would expect that there would be release artifacts for the sidecar as well as
the library once this functionality is available.
Dinesh
> On May 4, 2023, at 12:03 AM, guo Maxwell wrote:
>
> This is a very meaningful work, thanks , but I would like to ask a question
> that is not par
This is a very meaningful work, thanks , but I would like to ask a question
that is not particularly related to the cep project's code design itself
but the project engineering management : what is the future development and
release plan of this project?
As far as I know, project Cassandra Sidecar
If there aren't additional questions / comments I will start the VOTE thread on
this CEP tonight.
On 2023/05/01 19:50:12 Dinesh Joshi wrote:
> Does anybody have any questions that we could answer about this proposal?
We're reusing existing Cassandra code so the performance characteristics for
parsing should be the same as Cassandra. I will need to check if we have
benchmarks. If we do, we'll add it to the CEP wiki page.
On 2023/05/02 19:52:28 Sebastian Estevez wrote:
> Hey Dinesh,
>
> Yeah it makes sense th
Hey Dinesh,
Yeah it makes sense that the sstable streaming is network bound since it's
mostly just moving files.
Do you have any performance stats on the sstable parsing side inside spark?
--Seb
On Tue, May 2, 2023 at 3:31 PM Dinesh Joshi wrote:
> It is line rate / network bound. We have a pa
It is line rate / network bound. We have a patch out in vert.x that should use
the zero copy path for it. But it's not a strict prereq for it.
On 2023/05/02 15:39:02 Sebastian Estevez wrote:
> Hi folks,
>
> Great stuff thanks for sharing.
>
> The performance numbers I've seen so far are for the
Hi folks,
Great stuff thanks for sharing.
The performance numbers I've seen so far are for the sidecar streaming
sstables (seems like this is just network bound?). What kind of perf are
you seeing at the Spark executors (at the per task level)?
--Seb
On Mon, May 1, 2023 at 3:50 PM Dinesh Joshi
Does anybody have any questions that we could answer about this proposal?
> On Apr 27, 2023, at 1:24 PM, Francisco Guerrero
> wrote:
>
> Hi folks,
>
> We have updated the confluence page with the source code for CEP-28.
> There are two repositories with contributions. One is the patch [1]
> fo
Hi folks,
We have updated the confluence page with the source code for CEP-28.
There are two repositories with contributions. One is the patch [1]
for Cassandra Sidecar with the bulk APIs that enable the Cassandra
Spark Analytics library. The second is a new repository [2] with
contributions
__
> From: Doug Rohrer mailto:droh...@apple.com>>
> Sent: Tuesday, April 11, 2023 0:37
> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark
> Bulk A
/debezium/debezium-connector-cassandra
From: Doug Rohrer
Sent: Tuesday, April 11, 2023 0:37
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark
Bulk Analytics
NetApp Security WARNING: This is an external email
Thanks for those. They are very helpful.I think the CEP needs to call out all of the classes/interfaces from the cassandra-all jar that the “Spark driver” is using.Given this CEP is exposing “sstables as an external API” I would think all the interfaces and code associated with using those would ne
I’ve updated the CEP with two overview diagrams of the interactions between
Sidecar, Cassandra, and the Bulk Analytics library. Hope this helps folks
better understand how things work, and thanks for the patience as it took a bit
longer than expected for me to find the time for this.
Doug
> O
Sorry for the delay in responding here - yes, we can add some diagrams to the
CEP - I’ll try to get that done by end-of-week.
Thanks,
Doug
> On Mar 28, 2023, at 1:14 PM, J. D. Jordan wrote:
>
> Maybe some data flow diagrams could be added to the cep showing some example
> operations for read
Maybe some data flow diagrams could be added to the cep showing some example operations for read/write?On Mar 28, 2023, at 11:35 AM, Yifan Cai wrote:A lot of great discussions! On the sidecar front, especially what the role sidecar plays in terms of this CEP, I feel there might be some confusion.
A lot of great discussions!
On the sidecar front, especially what the role sidecar plays in terms of
this CEP, I feel there might be some confusion. Once the code is published,
we should have clarity.
Sidecar does not read sstables nor do any coordination for analytics
queries. It is local to the
I disagree with the first claim, as the process has all the information it chooses to utilise about which resources it’s using and what it’s using those resources for.The inability to isolate GC domains is something we cannot address, but also probably not a problem if we were doing everything with
On Tue, Mar 28, 2023 at 9:03 AM Joseph Lynch wrote:
...
I think we might be underselling how valuable JVM isolation is,
> especially for analytics queries that are going to pass the entire
> dataset through heap somewhat constantly.
>
Big +1 here. The JVM simply does not have significant granula
> One of the explicit goals of making an official sidecar project was to
> try to make it something the project does not break compatibility with
> as one of the main issues the third-party sidecars (that handle
> distributed control, backup, repair, etc ...) have is they break
> constantly because
Fwiw I’m sceptical of the performance angle long term. You can do a lot more to
control QoS when you understand what each query is doing, and what your SLOs
are. You can also more efficiently apportion your resources (not leaving any
lying fallow to ensure it’s free later)
But, we’re a long way
> If we want to bring groups/containers/etc into the default deployment
> mechanisms of C*, great. I am all for dividing it up into micro services
> given we solve all the problems I listed in the complexity section.
>
> I am actually all for dividing C* up into multiple micro services, but the
One of the explicit goals of making an official sidecar project was to
try to make it something the project does not break compatibility with
as one of the main issues the third-party sidecars (that handle
distributed control, backup, repair, etc ...) have is they break
constantly because C* breaks
>> Given the sidecar is running on the same node as the main C* process, the
>> only real resource isolation you have is in heap/GC? CPU/Memory/IO are all
>> still shared between the main C* process and the side car, and coordinating
>> those across processes is harder than coordinating them
On Tue, Mar 28, 2023 at 7:30 AM Jeremiah D Jordan
wrote:
> - Resources isolation. Having the said service running within the same JVM
> may negatively impact Cassandra storage's performance. It could be more
> beneficial to have them in Sidecar, which offers strong resource isolation
> guarantees
> - Resources isolation. Having the said service running within the same JVM
> may negatively impact Cassandra storage's performance. It could be more
> beneficial to have them in Sidecar, which offers strong resource isolation
> guarantees.
How does having this in a side car change the impact
Complex predicates on non-partition keys naturally require pulling the entire
data set into the Spark DataFrame to perform the query. We have some
optimizations around column filtering and partition key predicates, utilizing
the Filter.db/Summary.db/Index.db files to only read the data it needs.
Thank you for the write-up and the efforts on CASSANDRA-16222. It sounds like
you've been using this for some time. I understand from the rejected
alternatives that the Spark Cassandra Connector was slower because it goes
through the read and write path for C* rather than this backdoor mechani
On the Sidecar discussion, while Sidecar is the preferred mechanism for the
reasons described, the API is sufficiently generic enough to plugin a user
implementations (essentially provide a list of sstables for a token range, and
a mechanism to open an InputStream on any SSTable file component).
I want to second what Yifan's spoken to, specifically in terms of resource
isolation and availability.
While the sidecar hasn't seen a ton of traffic and contributions since the
acceptance into the project and clearance of CEP-1, my intuition is that that's
due to the entrenched maturity of alt
Oh, that's significantly different and great news, please do! Thanks
for the clarification, Doug!
Kind Regards,
Brandon
On Fri, Mar 24, 2023 at 4:42 PM Doug Rohrer wrote:
>
> I agree that the analytics library will need to support vnodes. To be clear,
> there’s nothing preventing the solution
Hi Jeremiah,
There are good reasons to not have these inside Cassandra. Consider the
following.
- Resources isolation. Having the said service running within the same JVM
may negatively impact Cassandra storage's performance. It could be more
beneficial to have them in Sidecar, which offers strong
I agree that the analytics library will need to support vnodes. To be clear,
there’s nothing preventing the solution from working with vnodes right now, and
no assumptions about a 1:1 topology between a token and a node. However, we
don’t, today, have the ability to test vnode support end-to-end
On Fri, Mar 24, 2023 at 10:39 AM Jeremiah D Jordan
wrote:
>
> I have concerns with the majority of this being in the sidecar and not in the
> database itself. I think it would make sense for the server side of this to
> be a new service exposed by the database, not in the sidecar. That way it
>>From: Doug Rohrer mailto:droh...@apple.com>
>> <mailto:droh...@apple.com>>
>>Sent: Thursday, March 23, 2023 18:33
>>To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>
>> <mailto:dev@cassandra.ap
.apache.org <mailto:dev@cassandra.apache.org>
> Cc: James Berragan
> Subject: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with
> Spark Bulk Analytics
>
> NetApp Security WARNING: This is an external email. Do not click
> links or open attac
From: Benjamin Lerer
Sent: Friday, March 24, 2023 10:35
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark
Bulk Analytics
NetApp Security WARNING: This is an external email. Do not click links or open
attachments unless you recognize the send
it might be a logical
> replacement of that.
>
> Regards
>
>
> From: Doug Rohrer
> Sent: Thursday, March 23, 2023 18:33
> To: dev@cassandra.apache.org
> Cc: James Berragan
> Subject: [DISCUSS] CEP-28: Reading and Writing Ca
From: Doug Rohrer
Sent: Thursday, March 23, 2023 18:33
To: dev@cassandra.apache.org
Cc: James Berragan
Subject: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk
Analytics
NetApp Security WARNING: This is an external email. Do not click links or open
attachments unless you
Hi everyone,
Wiki:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics
We’d like to propose this CEP for adoption by the community.
It is common for teams using Cassandra to find themselves looking for a way to
interact w
40 matches
Mail list logo