Re: [DISCUSS] Requirement to document features before releasing them

Jon Haddad Thu, 01 May 2025 06:27:38 -0700

Stefan, I agree with you, that's a pretty terrible way to go about writing
docs, and I'm not sure if you're being hyperbolic. I really don't think
it's what Patrick is suggesting.


LLMs are great at taking some notes and examples and turning it into docs.
You don't need to have it write everything, you just need to supply an
outline.  I do it all the time and it works great.  I *hate* the slow
process of writing, but I like making notes.

So, if someone like me has some rough notes on usage with examples, we can
use an LLM to turn that into a pretty good rough draft which then gets
refined.

Since it's better to show than tell - I went ahead and took 30 minutes to
update the UCS docs.  I had previously filed CASSANDRA-19389 to add
examples and use cases.  Patrick uploaded a google doc with some, but we
never got around to finishing it.

I fired up claude, and gave it these instructions:

==================

Looking at this file:
https://github.com/rustyrazorblade/cassandra/blob/rustyrazorblade/ucs-doc-examples/doc/modules/cassandra/pages/managing/operating/compaction/ucs.adoc

add a section above Read and write amplification called "Examples", to help
people migrate from other strategies and pick the right settings for their
workload.

Make a table with the following options:

Migrating From LCS: ALTER TABLE mykeyspace.foo WITH COMPACTION = {'class':
'UnifiedCompactionStrategy', 'scaling_parameters': 'L10'}; Migration from
SizeTieredCompactionStrategy: ALTER TABLE mykeyspace.foo WITH COMPACTION =
{'class': 'UnifiedCompactionStrategy', 'scaling_parameters': 'T4'};

Add another table for use cases. Use the above ALTER TABLE examples and
take the parameters I put below and reformat as ALTER TABLE statements, add
a third empty column where I can fill in an explanation

Read Heavy Key Value: scaling_parameters: 'L10', target_sstable_size:
'256MiB'        base_shard_count: 8. Write heavy: scaling_parameters: 'T4'
  target_sstable_size: '1GiB'      base_shard_count: 4
Time Series: scaling_parameters: 'T8' , target_sstable_size: '512MiB' ,
base_shard_count: 8, expired_sstable_check_frequency_seconds: 300

==================

I took the result, told it to generate asciidoc for it, told it "make the
ALTER TABLE statements show up as code" copied it and pasted into the ucs
doc.

That's my rough draft.

I made a couple tweaks, and pushed it up here:

https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:cassandra:rustyrazorblade/ucs-doc-examples

You can do this with a local LLM, ChatGPT, Claude, whatever you want.

Someone want to +1 the patch while we're at it?

Jon




On Thu, May 1, 2025 at 5:19 AM Miklosovic, Stefan via dev <
[email protected]> wrote:

> I am not completely sure LLMs are the way to go here. Sure, to have something
> to further refine ... why not. But to just generate something via LLM and
> commit that, that would be a no-no from me. These things can go hallucinate
> quite quickly, then what? Who is going to proof-read technical stuff like
> that? Fixing the hallucinations might take more time then just writing it
> from scratch.
>
>
>
> Anyway, I would really appreciate if we stayed on track and discussed the
> proposition mentioned in my first email - the end goal is to codify the
> need to provide documentation together with the feature. If not provided
> together, it might be in a separate ticket which will be a blocker for the
> next release.
>
>
>
> I might initiate the voting thread for that ...
>
>
>
> Regards
>
>
>
>
>
> *From: *Rolo, Carlos <[email protected]>
> *Date: *Thursday, 1 May 2025 at 12:30
> *To: *David Capwell <[email protected]>, [email protected] <
> [email protected]>
> *Cc: *Miklosovic, Stefan <[email protected]>
> *Subject: *Re: [DISCUSS] Requirement to document features before
> releasing them
>
> I am bit out of the loop on how/if this would extend to driver
> sub-projects.
>
> Because this makes 100% sense, and in the driver space as well. Looking
> into Java driver docs and making others similar would be a great.
>
>
>
> Patrich that LLM suggestion might be a life saver, let me try that!
> ------------------------------
>
> *From:* Miklosovic, Stefan via dev <[email protected]>
> *Sent:* 01 May 2025 08:07
> *To:* David Capwell <[email protected]>; [email protected] <
> [email protected]>
> *Cc:* Miklosovic, Stefan <[email protected]>
> *Subject:* Re: [DISCUSS] Requirement to document features before
> releasing them
>
>
>
> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>
>
>
> Denser is better. In your oversimplified example of Accord, as a user who
> encounters this for the first time, I am definitely interested in what the
> limitations are. What might happen quite easily is that if it is not dense
> and we just announce it sparsly, then a user takes it all at face value and
> if it starts to diverge from your proclamation then they might feel like
> they were lied to or they start to be disappointed. You got me? Users do
> not like surprises they are discovering themselves on the way of trying it
> out (and a lot of time painfully). They just want to know what they are
> buying themselves into.
>
>
>
> If there are super-cornercase details, that might be omitted as we have
> other channels of the communication (Slack, mailing list ...) but in
> general I do not see how a lot of documentation would be bad.
>
>
>
> It also depends on who you are writing that documentation to. As said, we
> talk about user-facing docs here. A documentation for developers where we
> are trying to boostrap them / to make them oriented in the code base is
> going to be substantially different from a user-facing one.
>
>
>
>
>
> *From: *David Capwell <[email protected]>
> *Date: *Wednesday, 30 April 2025 at 23:35
> *To: *[email protected] <[email protected]>
> *Cc: *Miklosovic, Stefan <[email protected]>
> *Subject: *Re: [DISCUSS] Requirement to document features before
> releasing them
>
> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>
>
>
> I wonder at what level can we enforce this.  What I mean, in modeling
> testing I have found some odd behaviors that people were not aware of
> (BATCH cell resolution, NULL handling (emptiness…..), etc.)… so if
> documentation is dense this can help force people to think through edge
> cases or how 2 features interact with each other…. If documentation is
> sparse, then you loose this benefit…
>
>
>
> Simple example for Accord
>
>
>
> # Sparse
>
>
>
> Multiple key transaction support, bringing Apache Cassandra cluster to the
> RDMS world!
>
>
>
> # Dense
>
>
>
> …
>
>
>
> Here are the current limitations, …
>
>
>
> Here is where we alter Apache Cassandra’s behavior to be more inline with
> SQL, ...
>
>
>
> On Apr 30, 2025, at 1:38 PM, Miklosovic, Stefan via dev <
> [email protected]> wrote:
>
>
>
>
>
> To extend the first e-mail to cover the practicalities:
>
>
>
>    1. changes introduced to nodetool would not be part of this because
>    they are self-documented (docs of help is autogenerated)
>    2. introduction of changes into cassandra.yaml is already covered as
>    that is what is autogenerated / on website also.
>    3. Applying common sense, if it is just enough to mention in NEWS.txt,
>    that is also fine.
>    4. metrics - I bet there are some which are not documented, we should
>    find a way how to autogenerate them into the website.
>
>
>
> I am also to blame and showing I am not a hypocrite, I have never
> delivered in-depth user documentation of CEP-24 with examples, use cases,
> and so on. I am trying to be more aware of the documentation when
> delivering features, to raise awareness about that etc. It is easy to not
> think about this too much when developers are in a rush and similar. If
> there was a hard requirement for the documentation, I would do it right
> away and I would not need to deal with this now.
>
>
>
> I understand that when delivering heavy-weights like CEP-15 we can not
> expect that all the docs will be done upon delivery but I want to stress
> the fact that providing usable documentation should be definitely something
> to think about when releasing it. Same goes for all other non-trivial
> features.
>
>
>
>
>
> *From: *Josh McKenzie <[email protected]>
> *Date: *Wednesday, 30 April 2025 at 22:11
> *To: *dev <[email protected]>
> *Cc: *Miklosovic, Stefan <[email protected]>
> *Subject: *Re: [DISCUSS] Requirement to document features before
> releasing them
>
> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments*
>
>
>
> This makes intuitive sense to me.
>
>
>
> In our case we could tie documentation to the process of promoting a
> feature from “experimental” to production ready, though I fear that might
> leave wiggle room for primary authors of some features to leave them as
> experimental forever, not desiring to take on the burden of documenting
> something that’s already merged in and usable by experts.
>
>
>
> Curious what others think.
>
>
>
> On Wed, Apr 30, 2025, at 12:10 PM, Miklosovic, Stefan via dev wrote:
>
> I am on OpenSearchCon and there was a discussion about the documentation
> of features. In a nutshell, the policy they seem to have is that there are
> some minimal requirements for documentation in place for each feature
> introduced. That way, there is no way (or it is greatly minimised) that
> there would be a feature released or some user-facing change introduced
> without any documentation how to use it.
>
>
>
> Under the "documentation", in our case, I mean the docs which would end up
> in cassandra.apache.org
> <https://urldefense.com/v3/__http:/cassandra.apache.org__;!!Nhn8V6BzJA!Q2uU9Ab38CiJSRJuSPI9bIKJfTgR9yuneyK2LGgK4a4YNMwL2jD1yVsG018wQlMrMAgKI9CfFzOtXbLNjERRjfVMrw$>
>  docs.
>
>
>
> In their case, the documentation is either part of the change or there is
> a documentation issue (in GitHub terms) created which basically blocks the
> release when not addressed.
>
>
>
> When there is no documentation about a feature or improvement, knob to
> tweak etc, there is virtually nobody who knows about that except the
> person who committed the code / people who participated in a review. I
> think this is detrimental to the project. I do not see the point in
> releasing something undocumented when the only people who know what is
> going on are the ones who wrote it.
>
>
>
> If somebody argued that we have them in CHANGES.txt and NEWS.txt, neither
> ends up on the website and I do not think they are appropriate vehicles for
> user-facing documentation or for anything beyond few sentences.
>
>
>
> Could we introduce a policy which would require developers to introduce at
> least minimal user-facing documentation (if applicable) before delivering
> it / before releasing it and it would be part of the reviews?
>
>
>
> For now, while we also add documentation, I feel it is "the best-effort"
> approach, it is not part of the official policy when delivering it.
>
>
>
> As of now, I can not see any information about documentation among "For
> Code Contributions" points:
>
>
>
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Project+Governance
> <https://urldefense.com/v3/__https:/cwiki.apache.org/confluence/display/CASSANDRA/Cassandra*Project*Governance__;Kys!!Nhn8V6BzJA!Q2uU9Ab38CiJSRJuSPI9bIKJfTgR9yuneyK2LGgK4a4YNMwL2jD1yVsG018wQlMrMAgKI9CfFzOtXbLNjETp4KSISQ$>
>
>
>
> I am looking for adding there a new point:
>
>
>
> Code must not be committed when user-facing functionality is not
> documented and visible without code inspection.
>
>
>
> Regards
>
>
>

Re: [DISCUSS] Requirement to document features before releasing them

Reply via email to