Yeah, no surprise, I was thinking the dicussion will go this direction. I am not completely sure who we are developing this for then. I see the statements like this and I am pretty disappointed:
"The project obviously aims to serve end users, but the developer community is the actual project and it is fine to serve that demographic first, or only. " What is the actual difference between working in a private fork and publishing the code publicly almost nobody understands? I get that people work for companies etc. but really, we should reflect quite hard on what we are doing here. Let's take Accord, for example. I can not see any justification for working on something for 3 years and not documenting how to use that when it really comes to it. Is Accord documentation for users on the way or not? Is the documentation for CEP-45 going to be done or not? How are operators outside of the authors of that change one can count on one hand (and working in one company) supposed to know how to use that? What is "open source" about that expect of that being online and publicly accessible? Over the last couple years Cassandra is getting more and more complex which might alienate even the developers working on it daily. People might get out of touch with all of the new features being rolled out and if this trend will continue without documenting it along the way I am very afraid that Cassandra becomes an exclusive self-serving club of elite programmers an average / begginner user has no way to catch up with, consultants will not know how to consult it and so on and so on. How is this good for anybody? What I am asking for is really not a rocket science and nothing "rigid". I am all open to lower the requirements. I am not asking for rephrasing whole CEP and present it to a user. I am asking for the description of the most common usages and scenarios with most important consequences and all configuration parameters. Let's take a look at CEP-37. https://cassandra.apache.org/doc/trunk/cassandra/managing/operating/auto_repair.html This is just wonderful and it is an example how it should be done. Why can not be this done for other CEPs too? What was different for CEP-37 when docs were written together with the code but it can not be done similarly for other CEPs as well? Regards From: Benedict <bened...@apache.org> Date: Thursday, 1 May 2025 at 14:37 To: dev@cassandra.apache.org <dev@cassandra.apache.org> Cc: Rolo, Carlos <carlos.r...@netapp.com>, Miklosovic, Stefan <stefan.mikloso...@netapp.com>, dev@cassandra.apache.org <dev@cassandra.apache.org> Subject: Re: [DISCUSS] Requirement to document features before releasing them EXTERNAL EMAIL - USE CAUTION when clicking links or attachments I am opposed to this. There’s too much imprecision in the “rule” while simultaneously being much too rigid, and it will be improperly enforced (we already have lots of rule breaking around modifying public APIs, that should have discuss threads and do not, for instance). This kind of arbitrary rule that is unaligned with contributors will likely lead to a bad and inconsistent documentation, which is worse than no documentation. We could perhaps stipulate that for a feature to leave experimental status the community must vote and that documentation should be a consideration. But this will only capture big changes. We could perhaps try other ideas like moratoriums on contributions that are not documentation, to encourage improvements there. We could perhaps try having LLMs generate documentation that new contributors could take a first pass at editing for correctness, before a committer takes a final pass. At the end of the day though, we’re an OSS project and we do have features (big and small) designed, implemented and likely only used by the sole contributor of the feature. We also have features used primarily by active community members who understand it well enough. I don’t think this is a bug in the system. The project obviously aims to serve end users, but the developer community is the actual project and it is fine to serve that demographic first, or only. I agree we want to improve our documentation, but this is not the right way to go about it. On 1 May 2025, at 13:19, Miklosovic, Stefan via dev <dev@cassandra.apache.org> wrote: I am not completely sure LLMs are the way to go here. Sure, to have something to further refine ... why not. But to just generate something via LLM and commit that, that would be a no-no from me. These things can go hallucinate quite quickly, then what? Who is going to proof-read technical stuff like that? Fixing the hallucinations might take more time then just writing it from scratch. Anyway, I would really appreciate if we stayed on track and discussed the proposition mentioned in my first email - the end goal is to codify the need to provide documentation together with the feature. If not provided together, it might be in a separate ticket which will be a blocker for the next release. I might initiate the voting thread for that ... Regards From: Rolo, Carlos <carlos.r...@netapp.com> Date: Thursday, 1 May 2025 at 12:30 To: David Capwell <dcapw...@apple.com>, dev@cassandra.apache.org <dev@cassandra.apache.org> Cc: Miklosovic, Stefan <stefan.mikloso...@netapp.com> Subject: Re: [DISCUSS] Requirement to document features before releasing them I am bit out of the loop on how/if this would extend to driver sub-projects. Because this makes 100% sense, and in the driver space as well. Looking into Java driver docs and making others similar would be a great. Patrich that LLM suggestion might be a life saver, let me try that! ________________________________ From: Miklosovic, Stefan via dev <dev@cassandra.apache.org> Sent: 01 May 2025 08:07 To: David Capwell <dcapw...@apple.com>; dev@cassandra.apache.org <dev@cassandra.apache.org> Cc: Miklosovic, Stefan <stefan.mikloso...@netapp.com> Subject: Re: [DISCUSS] Requirement to document features before releasing them EXTERNAL EMAIL - USE CAUTION when clicking links or attachments Denser is better. In your oversimplified example of Accord, as a user who encounters this for the first time, I am definitely interested in what the limitations are. What might happen quite easily is that if it is not dense and we just announce it sparsly, then a user takes it all at face value and if it starts to diverge from your proclamation then they might feel like they were lied to or they start to be disappointed. You got me? Users do not like surprises they are discovering themselves on the way of trying it out (and a lot of time painfully). They just want to know what they are buying themselves into. If there are super-cornercase details, that might be omitted as we have other channels of the communication (Slack, mailing list ...) but in general I do not see how a lot of documentation would be bad. It also depends on who you are writing that documentation to. As said, we talk about user-facing docs here. A documentation for developers where we are trying to boostrap them / to make them oriented in the code base is going to be substantially different from a user-facing one. From: David Capwell <dcapw...@apple.com> Date: Wednesday, 30 April 2025 at 23:35 To: dev@cassandra.apache.org <dev@cassandra.apache.org> Cc: Miklosovic, Stefan <stefan.mikloso...@netapp.com> Subject: Re: [DISCUSS] Requirement to document features before releasing them EXTERNAL EMAIL - USE CAUTION when clicking links or attachments I wonder at what level can we enforce this. What I mean, in modeling testing I have found some odd behaviors that people were not aware of (BATCH cell resolution, NULL handling (emptiness…..), etc.)… so if documentation is dense this can help force people to think through edge cases or how 2 features interact with each other…. If documentation is sparse, then you loose this benefit… Simple example for Accord # Sparse Multiple key transaction support, bringing Apache Cassandra cluster to the RDMS world! # Dense … Here are the current limitations, … Here is where we alter Apache Cassandra’s behavior to be more inline with SQL, ... On Apr 30, 2025, at 1:38 PM, Miklosovic, Stefan via dev <dev@cassandra.apache.org> wrote: To extend the first e-mail to cover the practicalities: 1. changes introduced to nodetool would not be part of this because they are self-documented (docs of help is autogenerated) 2. introduction of changes into cassandra.yaml is already covered as that is what is autogenerated / on website also. 3. Applying common sense, if it is just enough to mention in NEWS.txt, that is also fine. 4. metrics - I bet there are some which are not documented, we should find a way how to autogenerate them into the website. I am also to blame and showing I am not a hypocrite, I have never delivered in-depth user documentation of CEP-24 with examples, use cases, and so on. I am trying to be more aware of the documentation when delivering features, to raise awareness about that etc. It is easy to not think about this too much when developers are in a rush and similar. If there was a hard requirement for the documentation, I would do it right away and I would not need to deal with this now. I understand that when delivering heavy-weights like CEP-15 we can not expect that all the docs will be done upon delivery but I want to stress the fact that providing usable documentation should be definitely something to think about when releasing it. Same goes for all other non-trivial features. From: Josh McKenzie <jmcken...@apache.org<mailto:jmcken...@apache.org>> Date: Wednesday, 30 April 2025 at 22:11 To: dev <dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>> Cc: Miklosovic, Stefan <stefan.mikloso...@netapp.com<mailto:stefan.mikloso...@netapp.com>> Subject: Re: [DISCUSS] Requirement to document features before releasing them EXTERNAL EMAIL - USE CAUTION when clicking links or attachments This makes intuitive sense to me. In our case we could tie documentation to the process of promoting a feature from “experimental” to production ready, though I fear that might leave wiggle room for primary authors of some features to leave them as experimental forever, not desiring to take on the burden of documenting something that’s already merged in and usable by experts. Curious what others think. On Wed, Apr 30, 2025, at 12:10 PM, Miklosovic, Stefan via dev wrote: I am on OpenSearchCon and there was a discussion about the documentation of features. In a nutshell, the policy they seem to have is that there are some minimal requirements for documentation in place for each feature introduced. That way, there is no way (or it is greatly minimised) that there would be a feature released or some user-facing change introduced without any documentation how to use it. Under the "documentation", in our case, I mean the docs which would end up in cassandra.apache.org<https://urldefense.com/v3/__http:/cassandra.apache.org__;!!Nhn8V6BzJA!Q2uU9Ab38CiJSRJuSPI9bIKJfTgR9yuneyK2LGgK4a4YNMwL2jD1yVsG018wQlMrMAgKI9CfFzOtXbLNjERRjfVMrw$> docs. In their case, the documentation is either part of the change or there is a documentation issue (in GitHub terms) created which basically blocks the release when not addressed. When there is no documentation about a feature or improvement, knob to tweak etc, there is virtually nobody who knows about that except the person who committed the code / people who participated in a review. I think this is detrimental to the project. I do not see the point in releasing something undocumented when the only people who know what is going on are the ones who wrote it. If somebody argued that we have them in CHANGES.txt and NEWS.txt, neither ends up on the website and I do not think they are appropriate vehicles for user-facing documentation or for anything beyond few sentences. Could we introduce a policy which would require developers to introduce at least minimal user-facing documentation (if applicable) before delivering it / before releasing it and it would be part of the reviews? For now, while we also add documentation, I feel it is "the best-effort" approach, it is not part of the official policy when delivering it. As of now, I can not see any information about documentation among "For Code Contributions" points: https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Project+Governance<https://urldefense.com/v3/__https:/cwiki.apache.org/confluence/display/CASSANDRA/Cassandra*Project*Governance__;Kys!!Nhn8V6BzJA!Q2uU9Ab38CiJSRJuSPI9bIKJfTgR9yuneyK2LGgK4a4YNMwL2jD1yVsG018wQlMrMAgKI9CfFzOtXbLNjETp4KSISQ$> I am looking for adding there a new point: Code must not be committed when user-facing functionality is not documented and visible without code inspection. Regards