+1. Making it available as a marketplace could be very helpful. That's what I did with my C* skill: https://github.com/rustyrazorblade/skills/
On Mon, May 11, 2026 at 5:16 AM Ekaterina Dimitrova <[email protected]> wrote: > I second others, this looks awesome. I can’t wait to try it myself. > > “ > – I think these skills could be generalized to support bug-finding and > validation in other Apache projects. > ” > > This fully resonates with what I was thinking about, can’t wait to see it > in action in the other repos in the future too. > > > Best regards, > Ekaterina > > On Mon, 11 May 2026 at 5:16, Štefan Miklošovič <[email protected]> > wrote: > >> We should definitely merge this, I am already enjoying in-jvm-dtest >> creation from his patch. I have not used other skills yet but I bet they >> are equally powerful (if not more). >> >> There is also (1) which integrates commands around Ant etc. Would be cool >> to merge it into what Alex has in some fashion so we are covered both on >> build / tests execution as well as bugs analysis etc so we have one robust >> solution. >> >> (1) https://github.com/apache/cassandra/pull/4734 >> >> On Mon, May 11, 2026 at 11:31 AM C. Scott Andreas <[email protected]> >> wrote: >> >>> Alex - thanks so much for putting this together and sharing. >>> >>> Here are three additional data loss / corruption bugs identified by >>> Arjun Ashok using this set of skills last week: >>> >>> – https://issues.apache.org/jira/browse/CASSANDRA-21356: >>> CursorBasedCompaction: ReusableLivenessInfo.isExpiring() incorrectly >>> returns true for tombstone cells, corrupting cursor-compacted SSTable >>> format and cell reconciliation >>> – https://issues.apache.org/jira/browse/CASSANDRA-21357: >>> CursorBasedCompaction: prevUnfilteredSize always written as 0 in >>> SSTableCursorWriter >>> – https://issues.apache.org/jira/browse/CASSANDRA-21358: >>> CursorBasedCompaction: Final index block width off by one byte in >>> SSTableCursorWriter#appendBIGIndex() >>> >>> Stepping back a bit -- >>> >>> This set of skills combined with the Opus model have enabled folks to >>> find 14 data loss, corruption, and correctness bugs in the project in the >>> past ~two weeks. These are bugs that likely would have gone undetected - >>> and if encountered in the wild, would have required extensive manual fuzz >>> testing to reproduce and identify. >>> >>> In the case of the the issue that I'd found and reported: >>> https://issues.apache.org/jira/browse/CASSANDRA-21340: GROUP BY queries >>> silently return incomplete results due to premature SRP abort >>> >>> I found this by invoking the skill with the prompt "Review Cassandra's >>> implementation of GROUP BY for correctness. Identify edge cases that might >>> result in incorrect responses. After identifying candidate bugs, fan out >>> subagents to write unit tests and fuzz tests attempting to reproduce them. >>> Assess their veracity, and present them in order of concern." >>> >>> In less than 30 minutes while sitting on the sofa, the model and skill >>> identified CASSANDRA-21340. In another hour, I was able to establish its >>> veracity, then leave the model and prompt behind to work through the issue >>> and write up the Jira ticket by hand. >>> >>> I'm *really* impressed by what this set of skills enable, and I think >>> they may be transformative for quality in Apache Cassandra – especially >>> when combined with the ability to write in-JVM dtests; Harry tests; and to >>> use the Simulator. These also make it a lot easier to use each of these >>> tools. >>> >>> Here's how I'm thinking about this work so far: >>> >>> – The ensemble review skills are a great first-pass review that can be >>> used by anyone preparing a patch to identify potential issues. >>> – They're incredible for pointing at existing and/or new + experimental >>> components in Cassandra to find serious correctness issues. >>> – I'm sure we'd find latent issues if we directed the skills at >>> interaction between multiple components, like "range tombstones x short >>> read protection x reverse reads x compact storage" (etc). >>> – I think these skills could be generalized to support bug-finding and >>> validation in other Apache projects. >>> – I also think there is a generalization of these skills that could be >>> applied to CPU + allocation profiling and optimization. >>> >>> For those who have access to a suitable model, I'd love to hear your >>> experience attempting to find a latent bug in the database. >>> >>> I was shocked how easy it was, and am hopeful for what this might do for >>> quality and data integrity in the project. >>> >>> – Scott >>> >>> On May 8, 2026, at 5:22 PM, Alex Petrov <[email protected]> wrote: >>> >>> >>> I would recommend Opus 4.6+ for /deep-review, but /shallow-review is >>> probably fine with sonnet. >>> >>> Maybe time permitting, I can do evals for different models at some point. >>> >>> Review process is always a bottleneck and introducing such skills should >>> help to make it faster and more reliable. >>> >>> This is hope here, but this is also just a start: we need to reduce >>> false-positives, and do more with specifications (P, TLA+) for critical >>> parts of code. >>> >>> On Fri, May 8, 2026, at 5:56 PM, Dmitry Konstantinov wrote: >>> >>> Hi, Alex, thank you a lot for sharing it. I have been using Claude code >>> for review of my changes but in a very basic ad-hoc way, it works for >>> simple issues. The skills look much much more powerful. I am going to read >>> and try them in the upcoming weeks. >>> Review process is always a bottleneck and introducing such skills should >>> help to make it faster and more reliable. >>> >>> A question: what model(s) do you use to run them? Is Sonet 4.6 enough? >>> >>> Thanks, >>> Dmitry >>> >>> On Fri, 8 May 2026 at 14:03, Alex Petrov <[email protected]> wrote: >>> >>> >>> Hello folks, >>> >>> We have been working on some tooling [1] around Apache Cassandra >>> correctness, and wanted to share it with Cassandra community. >>> >>> We have approached this by "indexing" ~3k Cassandra issues and >>> extracting common patterns from them, generalizing them, then running >>> evals, tweaking, and extending them until we were had a strong signal that >>> it performs better than the run-of-the mill code review skill. We have >>> benchmarked it against some popular OSS skills (by presenting bugs we knew >>> existed from "indexing" Apache Kafka, inferring commit bug source from the >>> fix, and making sure benchmarked skills actually find it). >>> >>> In addition, I did my best to codify some things I knew about >>> correctness, researching code, and writing repros, and what I could find in >>> research papers and public blog posts. >>> >>> So far we were able to find (at very least) following issues (in reality >>> the number is higher but I have a backlog of potential leads to investigate >>> and reproduce longer than the time I have available for these pursuits). >>> >>> - deep review + fuzzer: >>> - CASSANDRA-21307 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21307>: Lower >>> bound [SSTABLE_UPPER_BOUND(row000063)] is bigger than first returned >>> value >>> - CASSANDRA-21292 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21292>: Row >>> re-inserted at the exact start of a range tombstone disappears after >>> major >>> compaction >>> - CASSANDRA-21255 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21255>: >>> Differentiate between legitimate cases where the first entry is the >>> same as >>> the last entry and empty bounds in SSTableCursorWriter#addIndexBlock() >>> - shallow + deep review: >>> - (latent) issue of unused keepFrom in linearSubtract >>> https://github.com/apache/cassandra-accord/pull/272 >>> - CASSANDRA-21336 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21336>: >>> CursorBasedCompaction: trailing present columns are silently dropped >>> in >>> encodeLargeColumnsSubset() >>> - CASSANDRA-21340 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21340>: GROUP BY >>> queries silently return incomplete results due to premature SRP abort >>> - CASSANDRA-21352 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21352> TCM: >>> AtomicLongBackedProcessor sort inversion >>> - CASSANDRA-21353 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21353> >>> putShortVolatile >>> is not volatile in InMemoryTrie >>> - Via specifications: >>> - CASSANDRA-21337 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21337>: >>> Difference in behavior between Cursor-Based compaction and "Regular" >>> compaction >>> - CASSANDRA-21336 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21336>: >>> CursorBasedCompaction: trailing present columns are silently dropped >>> in >>> encodeLargeColumnsSubset() >>> - CASSANDRA-21339 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21339>: >>> CursorBasedCompaction: expiring cells, same timestamp, same ldt, >>> different >>> ttl >>> - CASSANDRA-21338 >>> <https://issues.apache.org/jira/browse/CASSANDRA-21338>: value >>> comparison direction reversed in CursorCompactor >>> >>> A few folks were using this skill to test some of subsystems, and might >>> report more issues that I am not directly attributing here. I have also >>> used these skills for self-review and have caught a couple of issues before >>> they made it into the codebase. >>> >>> Despite some early success, I still consider this a very raw set of >>> prompts, but I think this has utility, and based on the success we have >>> seen so far, can be helpful and is (according to my measurement >>> methodology) fairing better than one-shot code review prompts that an LLM >>> would generate by user request. >>> >>> Since I was focusing on finding issues, running evals, and trying >>> several other methodologies that did not make into this version/cut, I did >>> not have a chance to sit and re-read the entire final result just yet, >>> which is why I am not suggesting merging this into Cassandra codebase until >>> we better vet it, but with your help and feedback maybe we can do this >>> quicker. >>> >>> Hope you find this useful, please share your opinion, experience, and >>> criticism. >>> >>> Happy bug hunting! >>> --Alex >>> >>> [1] https://github.com/apache/cassandra/pull/4794 >>> >>> >>> On Mon, Apr 13, 2026, at 1:12 PM, Štefan Miklošovič wrote: >>> >>> I noticed this PR just landed. >>> >>> Volunteers reviewing / improving greatly appreciated! >>> >>> (1) https://github.com/apache/cassandra/pull/4734 >>> >>> On Thu, Feb 26, 2026 at 5:43 PM Jon Haddad <[email protected]> >>> wrote: >>> >>> I wanted to share a couple of other things I thought of. I wrote this: >>> >>> > C*'s technical debt will make using an agent in the codebase much >>> harder than using one in my own >>> >>> I want to clarify my intent with this statement. I was trying to convey >>> that I've had the luxury of refactoring my code several times, because I >>> don't have to worry about messing with other people's branches. I usually >>> write something, use it briefly, find its faults, redo it, and iterate >>> several times. I never consider anything done and am always looking to >>> improve. This is very difficult with a project involving many people who >>> have in-flight branches spanning several months. Changes I consider >>> no-brainers might be a headache for C*. For example, I can just add a code >>> formatter and rewrite every file in the codebase. I make major changes >>> regularly without any consequences. Here, it impacts dozens of people. I >>> proactively improve my code's architecture because there are few, if any, >>> negative reasons not to. It's enabled me to pay off a ton of technical >>> debt that accumulated over the eight years I handwrote everything. >>> >>> Another example: I've been working on an orchestration tool around >>> easy-db-lab to automate running my tests across several clusters in >>> parallel. I recently refactored it to split the REST server code from the >>> execution into Gradle submodules. Now I can create different agents >>> specializing in each module's content, which slims down the context for >>> each agent. Since I have a very clear boundary on each agent's >>> responsibility, I avoid the overhead of having one agent manage one huge >>> codebase. I can specifically tell that one agent is responsible for this >>> directory, and its expertise is in Ktor. Another agent is a Gradle >>> expert. Another is Kubernetes. When I work on tasks they can be >>> decomposed into task lists for each specialized agent. >>> >>> I've always thought this would be a great architectural improvement for >>> the C* codebase regardless of LLMs. For example, putting the CQL parser in >>> a standalone module would allow us to publish it so people could consume it >>> in their own ecosystem without pulling in C*-all. Isolating a few of these >>> subsystems could reduce cognitive overhead and simplify test design. I'm >>> sure making the commit log reader standalone would make it much easier to >>> use in the sidecar. Easily using the SSTable readers and writers without >>> all the other dependencies would reduce workarounds in bulk analytics and >>> make these types of projects more feasible, benefiting the wider ecosystem. >>> >>> Regardless of this approach, creating a devcontainer environment for the >>> project and pushing the image to GHCR would also be beneficial. I am now >>> using one with each of my tools. I don't trust Claude not to wipe my >>> system, so I sandbox it in a container. It only has access to the local >>> project and cannot push code or reach GitHub. Devcontainers are supported >>> directly in IDEA, Zed, and VSCode. You can also launch them directly from >>> GitHub or use the Claude mobile app. I haven't spent much time on this yet >>> though, I still prefer two big 5k screens and a deafening mechanical >>> keyboard. >>> >>> Jon >>> >>> [1] >>> https://github.com/rustyrazorblade/easy-db-lab/blob/main/.devcontainer/devcontainer.json >>> [2] >>> https://github.com/rustyrazorblade/easy-db-lab/blob/main/.devcontainer/Dockerfile >>> >>> >>> >>> On Thu, Feb 26, 2026 at 12:58 AM Štefan Miklošovič < >>> [email protected]> wrote: >>> >>> Thank you Jon for sharing,that was very helpful. All these insights are >>> invaluable. >>> >>> On Wed, Feb 25, 2026 at 11:50 PM Jon Haddad <[email protected]> >>> wrote: >>> >>> Regarding ant, we'd probably want a wrapper shell script that is more >>> LLM-friendly, hiding the excessive text and providing more actionable >>> output. You can also delegate any task to a subagent so you don't waste >>> your context on the `ant` output, and use Claude's new Agent Teams [1] >>> feature to have a "builder" agent run in its own process. >>> Docs help Claude find code, big time. You can give it your >>> organizational structure and that institutional knowledge so it doesn't >>> have to pull in many tokens from dozens of files. It *definitely* works. >>> I've pushed over a quarter million LOC this month alone [1], and many of >>> you may already know I'm obsessed with efficiency. I constantly test new >>> ideas and approaches to refine my process; I've found good documentation is >>> *critical*. >>> >>> I've recently started working with both Spec-Kit (Microsoft, but it >>> looks abandoned) and OpenSpec, as both are designed to maintain long-term >>> memory for a project's product requirements and technical decisions. >>> OpenSpec is supposed to work better for brownfield and iterative projects. >>> I haven't tried BMAD yet. It seemed a bit more heavyweight, but it may be >>> better for this project than my personal ones, where I don't collaborate >>> with anyone. >>> >>> I have found that the best results come from loosely coupled systems. >>> C*'s technical debt will make using an agent in the codebase much harder >>> than using one in my own. I haven't tried to work on a patch in C* yet >>> with an agent, but when I do I'll be sure to share what I've learned. >>> >>> Today I introduced OpenSpec to easy-db-lab, you can see what it looks >>> like [3] if you're curious. A number of markdown commands were added to >>> the repo, and Spec-Kit was removed. I haven't reviewed it yet. By the >>> time you read this I will have likely made some changes in a review. If you >>> want to see the before and after, the pre-review commit is c6a94e1. >>> >>> Jon >>> >>> [1] https://code.claude.com/docs/en/agent-teams >>> [2] my 2 main projects, not including client work: >>> git log --since="$(date +%Y-%m-01)" --numstat --pretty=tformat: | awk >>> 'NF==3 {added+=$1; removed+=$2} END {print "Added:", added, "Removed:", >>> removed}' >>> Added: 90339 Removed: 45222 >>> >>> git log --since="$(date +%Y-%m-01)" --numstat --pretty=tformat: | awk >>> 'NF==3 {added+=$1; removed+=$2} END {print "Added:", added, "Removed:", >>> removed}' >>> Added: 124863 Removed: 52923 >>> >>> >>> [3] https://github.com/rustyrazorblade/easy-db-lab/pull/530/changes >>> >>> On Wed, Feb 25, 2026 at 6:18 AM David Capwell <[email protected]> >>> wrote: >>> >>> I’m not against memory / skills being added, but do want to request we >>> think / test to make sure we can quantify the gains >>> >>> <arxiv-logo-fb.png> >>> >>> Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for >>> Coding Agents? <https://arxiv.org/abs/2602.11988> >>> arxiv.org <https://arxiv.org/abs/2602.11988> >>> >>> <arxiv-logo-fb.png> >>> >>> SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks >>> <https://arxiv.org/abs/2602.12670> >>> arxiv.org <https://arxiv.org/abs/2602.12670> >>> >>> >>> These papers actually match my lived experience with this projects and >>> others. >>> >>> 1) using /init to create CLAUDE.md / AGENTS.md yields negative results. >>> This is how I started and have moved away. What is the context you need >>> 100% of the thing? It’s things that Claude can’t discover easy such as >>> tribal knowledge (such as link to our style guide). >>> 2) Ant is horrible for agents, not to figure out what to do (Claude is >>> good at that) but at context bloat… do “ant jar” and you add like 10-20k >>> tokens… you MUST have tooling to fix this (I ban Claude from touching ant >>> command, it’s only allowed to run “ai-build”, and “ai-ci-test” as these fix >>> the context problems; rtk “might” work here, not tested as in on leave) >>> 3) Claude doesn’t need docs to find code, that actually confuses it >>> more. When it needs to modify code it’s going to have to explore and will >>> most likely find what it needs. I agree docs for humans would help, but >>> let’s keep it out of AI memory files. >>> 4) I only really use sonnet/opus 4.5+, these claims might not be true >>> for older models or the open weight models. >>> >>> As for skills, the following makes sense to me but I really hope a human >>> writes as AI doesn’t do well at understanding the WHY well and makes bad >>> assumptions: property testing, stateful property testing, harry, The >>> Simulator. I left out cqltester because I found Claude doesn’t suck at it, >>> so not sure what a skill would add. The others I found it struggles with >>> and produces bad quality tests. >>> >>> Last comment: Stefan, your link about ai code in the project didn’t take >>> into account what happened in the PR. Our global static state world caused >>> a single test to fail which required a complete rewrite of the patch that I >>> ended up doing by hand. So that patch ended up being 100% human. >>> >>> Sent from my iPhone >>> >>> On Feb 18, 2026, at 6:29 PM, Štefan Miklošovič <[email protected]> >>> wrote: >>> >>> These are great points. I like how granular the approach of having >>> multiple files is. That means we do not need to craft one >>> "uber-claude.md" but we can do this iteratively and per specific >>> domain which is easier to handle. >>> >>> One consequence of having these "context files" is that a contributor >>> does not even need to use any AI whatsoever in order to be more >>> productive and organized. There is a lot of time lost when a new >>> contributor wants to understand how the project "thinks", what are >>> do-s and dont-s etc. All stuff which appears once a patch is >>> submitted. If we explained to everybody in plain English how this all >>> works on a detailed level, per domain, that would be tremendously >>> helpful even without AI. >>> >>> It will be interesting to watch how these files are written. To >>> formalize and write it down is quite a task on its own. >>> >>> >>> On Wed, Feb 18, 2026 at 6:47 PM Patrick McFadin <[email protected]> >>> wrote: >>> >>> >>> Context size is the hardest thing to manage right now in agentic coding. >>> I’ve stopped using MCP and switched to skills as a result. >>> >>> >>> A couple of things worth noting. You can use many multiple >>> CLAUDE.md/AGENT.md files in a large code base. I’m started doing this and >>> it is remarkable. For example, in the pylib directory a CLAUDE.md file >>> would provide the Python specific info if making changes. The standard >>> layout for each should be >>> >>> - What is this >>> >>> - Where do I get more information >>> >>> - How do I run or test >>> >>> - What are the non-nogetialble rules >>> >>> - What does done look like >>> >>> >>> Imagine one in all sorts of places. fqtool, sstableloader, o.a.c.io.*, >>> o.a.c.repair.* etc etc. And they can evolve over time as people use them. >>> >>> >>> The other thing to bring up is Brokk built by Jonathan Ellis. He >>> specifically built it for large code bases and specifically tests on the >>> Cassandra code base. (I’ll let him jump in here) >>> >>> >>> Patrick >>> >>> >>> On Feb 18, 2026, at 8:51 AM, Josh McKenzie <[email protected]> wrote: >>> >>> >>> I’ve had trouble using Claude effectively on C*’s large codebase without >>> a lot of repeated “repo discovery” prompting. >>> >>> >>> Just to keep beating the drum: I've had trouble working in our codebase >>> effectively without a lot of repeated "repo discovery" time. In fact, a >>> huge portion of the time I spend working on the codebase consists of >>> reading into adjacent coupled classes and modules since things are a) not >>> consistently or thoroughly documented, and b) generally not that decoupled. >>> >>> >>> This is also / primarily a "human <-> information interfacing efficiency >>> problem" and it just so happens LLM's and agents being blocked from working >>> on our codebase is giving us an immediate short-term pain-proxy for >>> something I strongly believe has been a long-term tax on us. >>> >>> >>> On Wed, Feb 18, 2026, at 10:04 AM, Isaac Reath wrote: >>> >>> >>> I'm a +1 for the same reason that Josh lays out. Markdown files that >>> detail the structure of the repo, how to build & run tests, how to get >>> checkstyle to pass, etc. are all very valuable to new contributors even if >>> LLMs went away today. >>> >>> >>> On Tue, Feb 17, 2026 at 7:33 PM Jon Haddad <[email protected]> >>> wrote: >>> >>> >>> It's all part of the same topic, Yifan. You're making a distinction >>> without a difference. We could just as easily be discussing supporting >>> certain MCP servers like serena, or baking claude into a devcontainer. >>> It's all relevant. There's no need to police the discussion. >>> >>> >>> On Tue, Feb 17, 2026 at 4:25 PM Yifan Cai <[email protected]> wrote: >>> >>> >>> The original post was about adding AI tooling, prompt, command, or >>> skill. The thread is shifted to AI memory files. >>> >>> >>> I do not have an objection to any of these, but want to make sure that >>> we are still on the original topic. >>> >>> >>> IMO, AI tooling has a clear scope / definition and is easier to reach >>> consensus on. Meanwhile, AI memory files are vague to define clearly. >>> Different developers on different domains could have quite different >>> preferences. >>> >>> >>> - Yifan >>> >>> >>> On Tue, Feb 17, 2026 at 3:37 PM Dmitry Konstantinov <[email protected]> >>> wrote: >>> >>> >>> I do not have my one but here there are few examples from oher Apache >>> projects: >>> >>> https://github.com/apache/camel/blob/main/AGENTS.md >>> >>> https://github.com/apache/ignite-3/blob/main/CLAUDE.md >>> >>> >>> https://github.com/apache/superset/blob/master/superset/mcp_service/CLAUDE.md >>> >>> >>> On Tue, 17 Feb 2026 at 23:22, Jon Haddad <[email protected]> >>> wrote: >>> >>> >>> I think a few folks are already using CLAUDE.md files in their repo and >>> they're just not committing them. >>> >>> Anyone want to share what's already done? I'm happy to help share what >>> I know about the agentic side of things, but since I don't do much in the >>> way of patching C* it would be a lot of guessing. >>> >>> >>> If I'm wrong and nobody shares one, I'll take a stab at it. >>> >>> >>> >>> >>> On Tue, Feb 17, 2026 at 3:08 PM Štefan Miklošovič < >>> [email protected]> wrote: >>> >>> >>> Great feedback everybody! Really appreciate it! >>> >>> >>> Reading what Jon posted ... Jon, I think you are the most experienced >>> >>> in this based on what you wrote. Would you mind doing some POC here >>> >>> for Cassandra repo? For the trunk it is enough ... Something we might >>> >>> build further on. I think we need to build the foundations of that and >>> >>> put some structure into it and all things considered I think you are >>> >>> best for the job here. >>> >>> >>> If the basics are there we can play with it more before merging, this >>> >>> is not something which needs to be done "tomorrow", we can collaborate >>> >>> on something together for some time and add things into it as patches >>> >>> come. I think it takes some time to "tune" it. >>> >>> >>> Everybody else feel free to help! My experience in this space is >>> >>> limited, I think there are people who are using it more often than me >>> >>> for sure. >>> >>> >>> Regards >>> >>> >>> On Wed, Feb 18, 2026 at 12:59 AM Joel Shepherd <[email protected]> >>> wrote: >>> >>> >>> There's been some momentum building for AGENTS.md files, both on the >>> >>> project and on the agent side: >>> >>> >>> https://agents.md/ >>> >>> >>> Same idea and benefits, but it might help to align folks on a "standard" >>> >>> that will work well across agents. >>> >>> >>> I also think that more and better code documentation can be very >>> >>> beneficial when using agents to help with working out implementation >>> >>> details. I spent a bunch of time in January writing an introduction to >>> >>> Apache Ratis (Raft as a library: >>> >>> >>> https://github.com/apache/ratis/blob/master/ratis-docs/src/site/markdown/index.md >>> ). >>> >>> The code itself is pretty well-documented but it was hard for me to >>> >>> build a mental model of how to integrate with. AI was very effective in >>> >>> taking the granular in-code documentation and synthesizing an overview >>> >>> from it. Going the other way, the in-code documentation has made it >>> >>> possible for me to deep dive the Ratis code to root cause bugs, etc. >>> >>> Agents can get a lot out of good class- and method-level documentation. >>> >>> >>> -- Joel. >>> >>> >>> On 2/16/2026 8:03 PM, Bernardo Botella wrote: >>> >>> CAUTION: This email originated from outside of the organization. Do not >>> click links or open attachments unless you can confirm the sender and know >>> the content is safe. >>> >>> >>> >>> >>> Thanks for bringing this up Stefan!! >>> >>> >>> A really interesting topic indeed. >>> >>> >>> >>> I’ve also heard ideas around even having Claude.md type of files that >>> help LLMs understand the code base without having to do a full scan every >>> time. >>> >>> >>> So, all and all, putting together something that we as a community think >>> that describe good practices + repository information not only for the main >>> Cassandra repository, but also for its subprojects, will definitely help >>> contributors adhere to standards and us reviewers to ensure that some steps >>> at least will have been considered. >>> >>> >>> Things like: >>> >>> - Repository structure. What every folder is >>> >>> - Tests suits and how they work and run >>> >>> - Git commits standards >>> >>> - Specific project lint rules (like braces in new lines!) >>> >>> - Preferred wording style for patches/documentation >>> >>> >>> Committed to the projects, and accesible to LLMs, sound like really >>> useful context for those type of contributions (that are going to keep >>> happening regardless). >>> >>> >>> So curious to read what others think. >>> >>> Bernardo >>> >>> >>> PD. Totally agree that this should change nothing of the quality bar for >>> code reviews and merged code >>> >>> >>> On Feb 16, 2026, at 6:27 PM, Štefan Miklošovič <[email protected]> >>> wrote: >>> >>> >>> Hey, >>> >>> >>> This happened recently in kernel space. (1), (2). >>> >>> >>> What that is doing, as I understand it, is that you can point LLM to >>> >>> these resources and then it would be more capable when reviewing >>> >>> patches or even writing them. It is kind of a guide / context provided >>> >>> to AI prompt. >>> >>> >>> I can imagine we would just compile something similar, merge it to the >>> >>> repo, then if somebody is prompting it then they would have an easier >>> >>> job etc etc, less error prone ... adhered to code style etc ... >>> >>> >>> This might look like a controversial topic but I think we need to >>> >>> discuss this. The usage of AI is just more and more frequent. From >>> >>> Cassandra's perspective there is just this (3) but I do not think we >>> >>> reached any conclusions there (please correct me if I am wrong where >>> >>> we are at with AI generated patches). >>> >>> >>> This is becoming an elephant in the room, I am noticing that some >>> >>> patches for Cassandra were prompted by AI completely. I think it would >>> >>> be way better if we make it easy for everybody contributing like that. >>> >>> >>> This does not mean that we, as committers, would believe what AI >>> >>> generated blindlessly. Not at all. It would still need to go over the >>> >>> formal review as anything else. But acting like this is not happening >>> >>> and people are just not going to use AI when trying to contribute is >>> >>> not right. We should embrace it in some form ... >>> >>> >>> 1) https://github.com/masoncl/review-prompts >>> >>> 2) >>> https://lore.kernel.org/lkml/[email protected]/ >>> >>> 3) https://lists.apache.org/thread/j90jn83oz9gy88g08yzv3rgyy0vdqrv7 >>> >>> >>> >>> >>> -- >>> >>> Dmitry Konstantinov >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Dmitry Konstantinov >>> >>> >>> >>>
