Alex - thanks so much for putting this together and sharing. Here are three additional data loss / corruption bugs identified by Arjun Ashok using this set of skills last week: – https://issues.apache.org/jira/browse/CASSANDRA-21356 : CursorBasedCompaction: ReusableLivenessInfo.isExpiring() incorrectly returns true for tombstone cells, corrupting cursor-compacted SSTable format and cell 
reconciliation – https://issues.apache.org/jira/browse/CASSANDRA-21357 : CursorBasedCompaction: prevUnfilteredSize always written as 0 in SSTableCursorWriter – https://issues.apache.org/jira/browse/CASSANDRA-21358 : CursorBasedCompaction: Final index block width off by one byte in SSTableCursorWriter#appendBIGIndex() Stepping back a bit -- This set of skills combined with the Opus model have 
enabled folks to find 14 data loss, corruption, and correctness bugs in the project in the past ~two weeks. These are bugs that likely would have gone undetected - and if encountered in the wild, would have required extensive manual fuzz testing to reproduce and identify. In the case of the the issue that I'd found and reported: https://issues.apache.org/jira/browse/CASSANDRA-21340 : GROUP BY 
queries silently return incomplete results due to premature SRP abort I found this by invoking the skill with the prompt "Review Cassandra's implementation of GROUP BY for correctness. Identify edge cases that might result in incorrect responses. After identifying candidate bugs, fan out subagents to write unit tests and fuzz tests attempting to reproduce them. Assess their veracity, and 
present them in order of concern." In less than 30 minutes while sitting on the sofa, the model and skill identified CASSANDRA-21340. In another hour, I was able to establish its veracity, then leave the model and prompt behind to work through the issue and write up the Jira ticket by hand. I'm *really* impressed by what this set of skills enable, and I think they may be transformative for 
quality in Apache Cassandra – especially when combined with the ability to write in-JVM dtests; Harry tests; and to use the Simulator. These also make it a lot easier to use each of these tools. Here's how I'm thinking about this work so far: – The ensemble review skills are a great first-pass review that can be used by anyone preparing a patch to identify potential issues. – They're incredible 
for pointing at existing and/or new + experimental components in Cassandra to find serious correctness issues. – I'm sure we'd find latent issues if we directed the skills at interaction between multiple components, like "range tombstones x short read protection x reverse reads x compact storage" (etc). – I think these skills could be generalized to support bug-finding and validation in 
other Apache projects. – I also think there is a generalization of these skills that could be applied to CPU + allocation profiling and optimization. For those who have access to a suitable model, I'd love to hear your experience attempting to find a latent bug in the database. I was shocked how easy it was, and am hopeful for what this might do for quality and data integrity in the project. – 
Scott On May 8, 2026, at 5:22 PM, Alex Petrov <[email protected]> wrote: I would recommend Opus 4.6+ for /deep-review, but /shallow-review is probably fine with sonnet. Maybe time permitting, I can do evals for different models at some point. Review process is always a bottleneck and introducing such skills should help to make it faster and more reliable. This is hope here, but this is also 
just a start: we need to reduce false-positives, and do more with specifications (P, TLA+) for critical parts of code. On Fri, May 8, 2026, at 5:56 PM, Dmitry Konstantinov wrote: Hi, Alex, thank you a lot for sharing it. I have been using Claude code for review of my changes but in a very basic ad-hoc way, it works for simple issues. The skills look much much more powerful. I am going to read and 
try them in the upcoming weeks. Review process is always a bottleneck and introducing such skills should help to make it faster and more reliable. A question: what model(s) do you use to run them? Is Sonet 4.6 enough? Thanks, Dmitry On Fri, 8 May 2026 at 14:03, Alex Petrov < [email protected] > wrote: Hello folks, We have been working on some tooling [1] around Apache Cassandra correctness, 
and wanted to share it with Cassandra community. We have approached this by "indexing" ~3k Cassandra issues and extracting common patterns from them, generalizing them, then running evals, tweaking, and extending them until we were had a strong signal that it performs better than the run-of-the mill code review skill. We have benchmarked it against some popular OSS skills (by presenting 
bugs we knew existed from "indexing" Apache Kafka, inferring commit bug source from the fix, and making sure benchmarked skills actually find it). In addition, I did my best to codify some things I knew about correctness, researching code, and writing repros, and what I could find in research papers and public blog posts. So far we were able to find (at very least) following issues (in 
reality the number is higher but I have a backlog of potential leads to investigate and reproduce longer than the time I have available for these pursuits). deep review + fuzzer: CASSANDRA-21307 : Lower bound [SSTABLE_UPPER_BOUND(row000063)] is bigger than first returned value CASSANDRA-21292 : Row re-inserted at the exact start of a range tombstone disappears after major compaction 
CASSANDRA-21255 : Differentiate between legitimate cases where the first entry is the same as the last entry and empty bounds in SSTableCursorWriter#addIndexBlock() shallow + deep review: (latent) issue of unused keepFrom in linearSubtract https://github.com/apache/cassandra-accord/pull/272 CASSANDRA-21336 : CursorBasedCompaction: trailing present columns are silently dropped in 
encodeLargeColumnsSubset() CASSANDRA-21340 : GROUP BY queries silently return incomplete results due to premature SRP abort CASSANDRA-21352 TCM: AtomicLongBackedProcessor sort inversion CASSANDRA-21353 putShortVolatile is not volatile in InMemoryTrie Via specifications: CASSANDRA-21337 : Difference in behavior between Cursor-Based compaction and "Regular" compaction CASSANDRA-21336 : 
CursorBasedCompaction: trailing present columns are silently dropped in encodeLargeColumnsSubset() CASSANDRA-21339 : CursorBasedCompaction: expiring cells, same timestamp, same ldt, different ttl CASSANDRA-21338 : value comparison direction reversed in CursorCompactor A few folks were using this skill to test some of subsystems, and might report more issues that I am not directly attributing here. 
I have also used these skills for self-review and have caught a couple of issues before they made it into the codebase. Despite some early success, I still consider this a very raw set of prompts, but I think this has utility, and based on the success we have seen so far, can be helpful and is (according to my measurement methodology) fairing better than one-shot code review prompts that an LLM 
would generate by user request. Since I was focusing on finding issues, running evals, and trying several other methodologies that did not make into this version/cut, I did not have a chance to sit and re-read the entire final result just yet, which is why I am not suggesting merging this into Cassandra codebase until we better vet it, but with your help and feedback maybe we can do this quicker. 
Hope you find this useful, please share your opinion, experience, and criticism. Happy bug hunting! --Alex [1] https://github.com/apache/cassandra/pull/4794 On Mon, Apr 13, 2026, at 1:12 PM, Štefan Miklošovič wrote: I noticed this PR just landed. Volunteers reviewing / improving greatly appreciated! (1) https://github.com/apache/cassandra/pull/4734 On Thu, Feb 26, 2026 at 5:43 PM Jon Haddad < 
[email protected] > wrote: I wanted to share a couple of other things I thought of. I wrote this: > C*'s technical debt will make using an agent in the codebase much harder than using one in my own I want to clarify my intent with this statement. I was trying to convey that I've had the luxury of refactoring my code several times, because I don't have to worry about messing with other 
people's branches. I usually write something, use it briefly, find its faults, redo it, and iterate several times. I never consider anything done and am always looking to improve. This is very difficult with a project involving many people who have in-flight branches spanning several months. Changes I consider no-brainers might be a headache for C*. For example, I can just add a code formatter and 
rewrite every file in the codebase. I make major changes regularly without any consequences. Here, it impacts dozens of people. I proactively improve my code's architecture because there are few, if any, negative reasons not to. It's enabled me to pay off a ton of technical debt that accumulated over the eight years I handwrote everything. Another example: I've been working on an orchestration 
tool around easy-db-lab to automate running my tests across several clusters in parallel. I recently refactored it to split the REST server code from the execution into Gradle submodules. Now I can create different agents specializing in each module's content, which slims down the context for each agent. Since I have a very clear boundary on each agent's responsibility, I avoid the overhead of 
having one agent manage one huge codebase. I can specifically tell that one agent is responsible for this directory, and its expertise is in Ktor. Another agent is a Gradle expert. Another is Kubernetes. When I work on tasks they can be decomposed into task lists for each specialized agent. I've always thought this would be a great architectural improvement for the C* codebase regardless of LLMs. 
For example, putting the CQL parser in a standalone module would allow us to publish it so people could consume it in their own ecosystem without pulling in C*-all. Isolating a few of these subsystems could reduce cognitive overhead and simplify test design. I'm sure making the commit log reader standalone would make it much easier to use in the sidecar. Easily using the SSTable readers and 
writers without all the other dependencies would reduce workarounds in bulk analytics and make these types of projects more feasible, benefiting the wider ecosystem. Regardless of this approach, creating a devcontainer environment for the project and pushing the image to GHCR would also be beneficial. I am now using one with each of my tools. I don't trust Claude not to wipe my system, so I 
sandbox it in a container. It only has access to the local project and cannot push code or reach GitHub. Devcontainers are supported directly in IDEA, Zed, and VSCode. You can also launch them directly from GitHub or use the Claude mobile app. I haven't spent much time on this yet though, I still prefer two big 5k screens and a deafening mechanical keyboard. Jon [1] 
https://github.com/rustyrazorblade/easy-db-lab/blob/main/.devcontainer/devcontainer.json [2] https://github.com/rustyrazorblade/easy-db-lab/blob/main/.devcontainer/Dockerfile On Thu, Feb 26, 2026 at 12:58 AM Štefan Miklošovič < [email protected] > wrote: Thank you Jon for sharing,that was very helpful. All these insights are invaluable. On Wed, Feb 25, 2026 at 11:50 PM Jon Haddad < 
[email protected] > wrote: Regarding ant, we'd probably want a wrapper shell script that is more LLM-friendly, hiding the excessive text and providing more actionable output. You can also delegate any task to a subagent so you don't waste your context on the `ant` output, and use Claude's new Agent Teams [1] feature to have a "builder" agent run in its own process. Docs help 
Claude find code, big time. You can give it your organizational structure and that institutional knowledge so it doesn't have to pull in many tokens from dozens of files. It *definitely* works. I've pushed over a quarter million LOC this month alone [1], and many of you may already know I'm obsessed with efficiency. I constantly test new ideas and approaches to refine my process; I've found good 
documentation is *critical*. I've recently started working with both Spec-Kit (Microsoft, but it looks abandoned) and OpenSpec, as both are designed to maintain long-term memory for a project's product requirements and technical decisions. OpenSpec is supposed to work better for brownfield and iterative projects. I haven't tried BMAD yet. It seemed a bit more heavyweight, but it may be better for 
this project than my personal ones, where I don't collaborate with anyone. I have found that the best results come from loosely coupled systems. C*'s technical debt will make using an agent in the codebase much harder than using one in my own. I haven't tried to work on a patch in C* yet with an agent, but when I do I'll be sure to share what I've learned. Today I introduced OpenSpec to 
easy-db-lab, you can see what it looks like [3] if you're curious. A number of markdown commands were added to the repo, and Spec-Kit was removed. I haven't reviewed it yet. By the time you read this I will have likely made some changes in a review. If you want to see the before and after, the pre-review commit is c6a94e1. Jon [1] https://code.claude.com/docs/en/agent-teams [2] my 2 main projects, 
not including client work: git log --since="$(date +%Y-%m-01)" --numstat --pretty=tformat: | awk 'NF==3 {added+=$1; removed+=$2} END {print "Added:", added, "Removed:", removed}' Added: 90339 Removed: 45222 git log --since="$(date +%Y-%m-01)" --numstat --pretty=tformat: | awk 'NF==3 {added+=$1; removed+=$2} END {print "Added:", added, 
"Removed:", removed}' Added: 124863 Removed: 52923 [3] https://github.com/rustyrazorblade/easy-db-lab/pull/530/changes On Wed, Feb 25, 2026 at 6:18 AM David Capwell < [email protected] > wrote: I’m not against memory / skills being added, but do want to request we think / test to make sure we can quantify the gains <arxiv-logo-fb.png> Evaluating AGENTS.md: Are 
Repository-Level Context Files Helpful for Coding Agents? arxiv.org <arxiv-logo-fb.png> SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks arxiv.org These papers actually match my lived experience with this projects and others. 1) using /init to create CLAUDE.md / AGENTS.md yields negative results. This is how I started and have moved away. What is the context you need 
100% of the thing? It’s things that Claude can’t discover easy such as tribal knowledge (such as link to our style guide). 2) Ant is horrible for agents, not to figure out what to do (Claude is good at that) but at context bloat… do “ant jar” and you add like 10-20k tokens… you MUST have tooling to fix this (I ban Claude from touching ant command, it’s only allowed to run “ai-build”, and 
“ai-ci-test” as these fix the context problems; rtk “might” work here, not tested as in on leave) 3) Claude doesn’t need docs to find code, that actually confuses it more. When it needs to modify code it’s going to have to explore and will most likely find what it needs. I agree docs for humans would help, but let’s keep it out of AI memory files. 4) I only really use sonnet/opus 4.5+, these 
claims might not be true for older models or the open weight models. As for skills, the following makes sense to me but I really hope a human writes as AI doesn’t do well at understanding the WHY well and makes bad assumptions: property testing, stateful property testing, harry, The Simulator. I left out cqltester because I found Claude doesn’t suck at it, so not sure what a skill would add. The 
others I found it struggles with and produces bad quality tests. Last comment: Stefan, your link about ai code in the project didn’t take into account what happened in the PR. Our global static state world caused a single test to fail which required a complete rewrite of the patch that I ended up doing by hand. So that patch ended up being 100% human. Sent from my iPhone On Feb 18, 2026, at 6:29 
PM, Štefan Miklošovič < [email protected] > wrote: These are great points. I like how granular the approach of having multiple files is. That means we do not need to craft one "uber-claude.md" but we can do this iteratively and per specific domain which is easier to handle. One consequence of having these "context files" is that a contributor does not even need to 
use any AI whatsoever in order to be more productive and organized. There is a lot of time lost when a new contributor wants to understand how the project "thinks", what are do-s and dont-s etc. All stuff which appears once a patch is submitted. If we explained to everybody in plain English how this all works on a detailed level, per domain, that would be tremendously helpful even 
without AI. It will be interesting to watch how these files are written. To formalize and write it down is quite a task on its own. On Wed, Feb 18, 2026 at 6:47 PM Patrick McFadin < [email protected] > wrote: Context size is the hardest thing to manage right now in agentic coding. I’ve stopped using MCP and switched to skills as a result. A couple of things worth noting. You can use many 
multiple CLAUDE.md/AGENT.md files in a large code base. I’m started doing this and it is remarkable. For example, in the pylib directory a CLAUDE.md file would provide the Python specific info if making changes. The standard layout for each should be - What is this - Where do I get more information - How do I run or test - What are the non-nogetialble rules - What does done look like Imagine one 
in all sorts of places. fqtool, sstableloader, o.a.c.io.*, o.a.c.repair.* etc etc. And they can evolve over time as people use them. The other thing to bring up is Brokk built by Jonathan Ellis. He specifically built it for large code bases and specifically tests on the Cassandra code base. (I’ll let him jump in here) Patrick On Feb 18, 2026, at 8:51 AM, Josh McKenzie < [email protected] 
> wrote: I’ve had trouble using Claude effectively on C*’s large codebase without a lot of repeated “repo discovery” prompting. Just to keep beating the drum: I've had trouble working in our codebase effectively without a lot of repeated "repo discovery" time. In fact, a huge portion of the time I spend working on the codebase consists of reading into adjacent coupled classes and 
modules since things are a) not consistently or thoroughly documented, and b) generally not that decoupled. This is also / primarily a "human <-> information interfacing efficiency problem" and it just so happens LLM's and agents being blocked from working on our codebase is giving us an immediate short-term pain-proxy for something I strongly believe has been a long-term tax on 
us. On Wed, Feb 18, 2026, at 10:04 AM, Isaac Reath wrote: I'm a +1 for the same reason that Josh lays out. Markdown files that detail the structure of the repo, how to build & run tests, how to get checkstyle to pass, etc. are all very valuable to new contributors even if LLMs went away today. On Tue, Feb 17, 2026 at 7:33 PM Jon Haddad < [email protected] > wrote: It's all part of 
the same topic, Yifan. You're making a distinction without a difference. We could just as easily be discussing supporting certain MCP servers like serena, or baking claude into a devcontainer. It's all relevant. There's no need to police the discussion. On Tue, Feb 17, 2026 at 4:25 PM Yifan Cai < [email protected] > wrote: The original post was about adding AI tooling, prompt, command, or 
skill. The thread is shifted to AI memory files. I do not have an objection to any of these, but want to make sure that we are still on the original topic. IMO, AI tooling has a clear scope / definition and is easier to reach consensus on. Meanwhile, AI memory files are vague to define clearly. Different developers on different domains could have quite different preferences. - Yifan On Tue, Feb 
17, 2026 at 3:37 PM Dmitry Konstantinov < [email protected] > wrote: I do not have my one but here there are few examples from oher Apache projects: https://github.com/apache/camel/blob/main/AGENTS.md https://github.com/apache/ignite-3/blob/main/CLAUDE.md https://github.com/apache/superset/blob/master/superset/mcp_service/CLAUDE.md On Tue, 17 Feb 2026 at 23:22, Jon Haddad < 
[email protected] > wrote: I think a few folks are already using CLAUDE.md files in their repo and they're just not committing them. Anyone want to share what's already done? I'm happy to help share what I know about the agentic side of things, but since I don't do much in the way of patching C* it would be a lot of guessing. If I'm wrong and nobody shares one, I'll take a stab at it. On 
Tue, Feb 17, 2026 at 3:08 PM Štefan Miklošovič < [email protected] > wrote: Great feedback everybody! Really appreciate it! Reading what Jon posted ... Jon, I think you are the most experienced in this based on what you wrote. Would you mind doing some POC here for Cassandra repo? For the trunk it is enough ... Something we might build further on. I think we need to build the 
foundations of that and put some structure into it and all things considered I think you are best for the job here. If the basics are there we can play with it more before merging, this is not something which needs to be done "tomorrow", we can collaborate on something together for some time and add things into it as patches come. I think it takes some time to "tune" it. 
Everybody else feel free to help! My experience in this space is limited, I think there are people who are using it more often than me for sure. Regards On Wed, Feb 18, 2026 at 12:59 AM Joel Shepherd < [email protected] > wrote: There's been some momentum building for AGENTS.md files, both on the project and on the agent side: https://agents.md/ Same idea and benefits, but it might help to 
align folks on a "standard" that will work well across agents. I also think that more and better code documentation can be very beneficial when using agents to help with working out implementation details. I spent a bunch of time in January writing an introduction to Apache Ratis (Raft as a library: https://github.com/apache/ratis/blob/master/ratis-docs/src/site/markdown/index.md ). The 
code itself is pretty well-documented but it was hard for me to build a mental model of how to integrate with. AI was very effective in taking the granular in-code documentation and synthesizing an overview from it. Going the other way, the in-code documentation has made it possible for me to deep dive the Ratis code to root cause bugs, etc. Agents can get a lot out of good class- and method-level 
documentation. -- Joel. On 2/16/2026 8:03 PM, Bernardo Botella wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Thanks for bringing this up Stefan!! A really interesting topic indeed. I’ve also heard ideas around even having Claude.md type of files that help LLMs understand 
the code base without having to do a full scan every time. So, all and all, putting together something that we as a community think that describe good practices + repository information not only for the main Cassandra repository, but also for its subprojects, will definitely help contributors adhere to standards and us reviewers to ensure that some steps at least will have been considered. Things 
like: - Repository structure. What every folder is - Tests suits and how they work and run - Git commits standards - Specific project lint rules (like braces in new lines!) - Preferred wording style for patches/documentation Committed to the projects, and accesible to LLMs, sound like really useful context for those type of contributions (that are going to keep happening regardless). So curious to 
read what others think. Bernardo PD. Totally agree that this should change nothing of the quality bar for code reviews and merged code On Feb 16, 2026, at 6:27 PM, Štefan Miklošovič < [email protected] > wrote: Hey, This happened recently in kernel space. (1), (2). What that is doing, as I understand it, is that you can point LLM to these resources and then it would be more capable when 
reviewing patches or even writing them. It is kind of a guide / context provided to AI prompt. I can imagine we would just compile something similar, merge it to the repo, then if somebody is prompting it then they would have an easier job etc etc, less error prone ... adhered to code style etc ... This might look like a controversial topic but I think we need to discuss this. The usage of AI is 
just more and more frequent. From Cassandra's perspective there is just this (3) but I do not think we reached any conclusions there (please correct me if I am wrong where we are at with AI generated patches). This is becoming an elephant in the room, I am noticing that some patches for Cassandra were prompted by AI completely. I think it would be way better if we make it easy for everybody 
contributing like that. This does not mean that we, as committers, would believe what AI generated blindlessly. Not at all. It would still need to go over the formal review as anything else. But acting like this is not happening and people are just not going to use AI when trying to contribute is not right. We should embrace it in some form ... 1) https://github.com/masoncl/review-prompts 2) 
https://lore.kernel.org/lkml/[email protected]/ 3) https://lists.apache.org/thread/j90jn83oz9gy88g08yzv3rgyy0vdqrv7 -- Dmitry Konstantinov -- Dmitry Konstantinov

Reply via email to