Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

2024-09-28 Thread Jon Haddad
Yes! I’m really looking forward to trying this out. The CEP looks really
well thought out. I think this will make CDC a lot more useful for a lot of
teams.

Jon


On Fri, Sep 27, 2024 at 4:23 PM Josh McKenzie  wrote:

> Really excited to see this hit the ML James.
>
> As author of the base CDC (get your stones ready for throwing :D) and
> someone moderately involved in the CEP here, definitely welcome any
> questions. CDC is a *thorny* *problem *in a multi-replica distributed
> system like this.
>
> On Fri, Sep 27, 2024, at 5:40 PM, James Berragan wrote:
>
> Hi everyone,
>
> Wiki:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-44%3A+Kafka+integration+for+Cassandra+CDC+using+Sidecar
>
> We would like to propose this CEP for adoption by the community.
>
> CDC is a common technique in databases but right now there is no
> out-of-the-box solution to do this easily and at scale with Cassandra. Our
> proposal is to build a fully-fledged solution into the Apache Cassandra
> Sidecar. This comes with a number of benefits:
> - Sidecar is an official part of the existing Cassandra eco-system.
> - Sidecar runs co-located with Cassandra instances and so scales with the
> cluster size.
> - Sidecar can access the underlying Cassandra database to store CDC
> configuration and the CDC state in a special table.
> - Running in the Sidecar does not require additional external resources to
> run.
>
> The core CDC module we anticipate will be pluggable and re-usable, it is
> available for review here:
> https://github.com/apache/cassandra-analytics/pull/87. The remaining
> Sidecar code will follow.
>
> As a reminder, please keep the discussion here on the dev list vs. in the
> wiki, as we’ve found it easier to manage via email.
>
> Sincerely,
> James Berragan
> Bernardo Botella Corbi
> Yifan Cai
> Jyothsna Konisa
>
>
>


Re: CEP-15: Accord status

2024-09-28 Thread David Capwell
Today I learned… I had no clue we had markdown files in src/java…

$ find src/ -name '*.md'
src//java/org/apache/cassandra/io/sstable/SSTable_API.md
src//java/org/apache/cassandra/io/sstable/format/bti/BtiFormat.md
src//java/org/apache/cassandra/utils/bytecomparable/ByteComparable.md
src//java/org/apache/cassandra/tcm/TCM_implementation.md
src//java/org/apache/cassandra/tcm/TransactionalClusterMetadata.md
src//java/org/apache/cassandra/db/memtable/Memtable_API.md
src//java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md
src//java/org/apache/cassandra/db/tries/InMemoryTrie.md
src//java/org/apache/cassandra/db/tries/Trie.md
src//java/org/apache/cassandra/index/sai/README.md
src//java/org/apache/cassandra/service/paxos/Paxos.md



We don’t have one at the moment but it would be good to get that in.  At a high 
level there are a few key classes

1) org.apache.cassandra.cql3.statements.TransactionStatement - this class 
handles BEGIN TRANSACTION in CQL
2) org.apache.cassandra.service.consensus.TransactionalMode - this is a table 
property and dictates what is allowed for the table.  If off accord 
transactions are not allowed, if “full” normal read/write get migrated to 
Accord (and you can still use BEGIN TRANSACTION)
3) org.apache.cassandra.service.accord.AccordService - the global static 
instance that lets Cassandra call Accord stuff

> On Sep 27, 2024, at 7:20 AM, Paulo Motta  wrote:
> 
> Thanks all for the work on this epic!
> 
> Is there an implementation summary guide similar to guide_8099.md [1] that 
> can help reviewers not involved with the effort navigate through the code ? 
> It would be great to have it if this is not already available or being 
> planned. There's a similar one though much smaller in scope for memtable API 
> on [2].
> 
> [1] - 
> https://github.com/apache/cassandra/blob/cassandra-3.0.0-rc2/guide_8099.md 
> [2] - 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/memtable/Memtable_API.md
> 
> On Fri, Sep 27, 2024 at 8:09 AM Benedict Elliott Smith  > wrote:
>> If you exclude test changes, there’s < 50k added and ~2k removed. This works 
>> out to ~7% of the scale of 8099 for lines modified, if this is the benchmark 
>> for disruption.
>> 
>> Altogether, this is a very small patch from the perspective of the existing 
>> codebase. Probably doesn’t even come close to the top 10.
>> 
>> Conversely, for new standalone features, this is likely the most complex 
>> thing we have ever merged to the project. But, it is off by default, and the 
>> risk to deployments therefore is very minimal. 
>> 
>> Regarding how parties can engage, I think if we’re honest history shows that 
>> engagement will be minimal. There have after all been several touch points, 
>> and none have materialised into really significant engagement. This is just 
>> the reality of everyone having their own pressures - at the end of the day, 
>> changes happen and the community adapts. But, we are here to answer any 
>> questions - as we have been throughout the development of the work in the 
>> open.
>> 
>> 
>> 
>>> On 20 Sep 2024, at 22:08, Josh McKenzie >> > wrote:
>>> 
 This presents an opportune moment for those interested to review the code.
 ...
 +88,341 −7,341
 1003 Files changed
>>> 
>>> O.o 
>>> This is... very large. If we use CASSANDRA-8099 as our "banana for scale":
 645 files changed, 49381 insertions(+), 42227 deletions(-)
>>> 
>>> To be clear - I don't think we collectively should be worried about 
>>> disruption from this patch since:
>>> Each commit (or the vast majority?) has already been reviewed by >= 1 other 
>>> committer
>>> 7.3k deletions is a lot less than 42k
>>> We now have fuzzing, property based testing, and the simulator
>>> Most of this code is additive
>>> How would you recommend interested parties engage with reviewing this 
>>> behemoth? Or perhaps subsections of it or key areas to familiarize 
>>> themselves with the structure?
>>> 
>>> On Fri, Sep 20, 2024, at 12:17 PM, David Capwell wrote:
 Recently, we rebased against the trunk branch, ensuring that the accord 
 branch is now in sync with the latest trunk version. This presents an 
 opportune moment for those interested to review the code.
 
 We have a pending pull request 
 (https://github.com/apache/cassandra/pull/3552) that we do not intend to 
 merge.
 
 Our current focus is on addressing several bug fixes and ensuring the 
 safety of topology changes (as evidenced by the number of issues filed 
 against the trunk). Once we wrap up bug fixes and safety features, we will 
 likely discuss the merge to trunk, so now is a great time to start 
 engaging.
 
 Thank you everyone for your patience!
>>