Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-21 Thread James Berragan
I think this is an idea worth exploring, my guess is that even if the scope is confined to just "copy if not exists" it would still largely be used as a cloud-agnostic backup/restore solution, and so will be shaped accordingly. Some thoughts: - I think it would be worth exploring more what the di

Re: [DISCUSS] Tooling to repair MV through a Spark job

2024-12-06 Thread James Berragan
I think this would be useful and - having never really used Materialized Views - I didn't know it was an issue for some users. I would say the Cassandra Analytics library (http://github.com/apache/cassandra-analytics/) could be utilized for much of this, with a specialized Spark job for this purpos

Re: [VOTE] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-10-21 Thread James Berragan
Yifan Cai wrote: >>> >>> +1 nb >>> >>> -- >>> >>> *From:* Brandon Williams >>> *Sent:* Thursday, October 17, 2024 11:47:13 AM >>> *To:* dev@cassandra.apache.org >>> *Subject:* Re: [VOTE] CE

[VOTE] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-10-17 Thread James Berragan
Hi everyone, I would like to start the voting for CEP-44 as all the feedback in the discussion thread seems to be addressed. Proposal: CEP-44: Kafka integration for Cassandra CDC using Sidecar

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-10-01 Thread James Berragan
e load would be > handled. > > This has been discussed many times before, but is it time to introduce the > concept of an elected leader for a token range for this type of operation? > It would eliminate a ton of problems that need to managed when bridging c* > to a system l

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-30 Thread James Berragan
Thanks for the discussions. I do anticipate that Accord will make things very much better, however I think if consumers are ultimately going to be replay the log into some other system (say Apache Iceberg) exact-once delivery will always be tricky, but perhaps not entirely necessary given the linea

[DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-27 Thread James Berragan
available for review here: https://github.com/apache/cassandra-analytics/pull/87. The remaining Sidecar code will follow. As a reminder, please keep the discussion here on the dev list vs. in the wiki, as we’ve found it easier to manage via email. Sincerely, James Berragan Bernardo Botella Corbi Yifan Cai

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-04-12 Thread James Berragan
Hi Stefan, CDC is something we are also thinking about, and worthy of a separate discussion. We have tested Spark Streaming for CDC and I hope we can bolt on in the future, but streaming technologies also come with more caveats and nuances (we have found it beneficial with CDC to store a small a

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-27 Thread James Berragan
ranch with code that will accompany this CEP to >> help readers understand it better. >> >> As a reminder, please keep the discussion here on the dev list vs. in the >> wiki, as we’ve found it easier to manage via email. >> >> Sincerely, >> >> Doug Rohrer & James Berragan >

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-27 Thread James Berragan
On the Sidecar discussion, while Sidecar is the preferred mechanism for the reasons described, the API is sufficiently generic enough to plugin a user implementations (essentially provide a list of sstables for a token range, and a mechanism to open an InputStream on any SSTable file component).

Spark-Cassandra Bulk Reader: CASSANDRA-16222

2020-10-23 Thread James Berragan
Hi everyone, I want to highlight to the dev community CASSANDRA-16222 , a Spark library we have been working on that can compact and read raw Cassandra SSTables into SparkSQL. By reading the sstables directly from a snapshot directory we are