MichaelChenGithub commented on issue #62500:
URL: https://github.com/apache/airflow/issues/62500#issuecomment-4129118761
Hi @jason810496 @potiuk — while building the PoC I ran into three design
questions I'd love your input on.
---
**Q1 — AGENTS.md vs contributing-docs: which takes precedence when they
conflict?**
While implementing contributing-docs as the single source of truth, I found
that `AGENTS.md` and the contributing-docs describe the same workflows
differently. Specifically, the `uv run --project <PROJECT> pytest` pattern
(running from repo root with the `--project` flag) only appears in `AGENTS.md`
— the contributing-docs (`contributing-docs/testing/unit_tests.rst`,
`contributing-docs/07_local_virtualenv.rst`) only show `uv run pytest` after
`cd`-ing into a subdirectory.
When there's a conflict like this, which source should the skill body
follow? And is the correct fix to first update contributing-docs to include the
`--project` pattern (so AGENTS.md and contributing-docs are consistent) before
embedding the SKILL block?
---
**Q2 — Single anchor vs multi-chunk: which design do you prefer for
long-term maintainability?**
For a skill like `airflow-run-pytest`, the relevant knowledge is spread
across multiple RST files. When designing the SKILL block embedding, there are
two general approaches:
**Option A — Single anchor file, synthesized body**
One RST file is chosen as the canonical anchor (e.g.
`contributing-docs/testing/unit_tests.rst`). A human synthesizes all relevant
knowledge from across the files into that one SKILL block body.
- Pro: Simple generator — one block = one skill
- Con: When someone updates one of the other source files, they have to know
to also update the SKILL block in a *different* file. Drift is almost
guaranteed over time.
**Option B — Multi-chunk, merged by generator**
Each RST file contributes a tagged chunk for a skill. The generator collects
all chunks for a skill ID and assembles them in section order:
```
.. SKILL-CHUNK skill=airflow-run-pytest section=1-how-to-run
body: |
## How to run tests
...
.. SKILL-CHUNK-END
```
- Pro: Each RST file owns its contribution. When you touch that file, the
relevant chunk is right there. Drift resistance is real.
- Con: Generator gets more complex. Requires explicit section ordering
across files.
Which approach do you think is better for long-term maintainability across
the codebase?
---
**Q3 — Tag-based rendering vs standalone duplicate block?**
When embedding skill content in the RST, should the approach be:
**Option X — Tag-based (inline)**: The SKILL block tags an *existing*
section of the RST. The generator renders that tagged content into the SKILL.md
body.
- Pro: Single source of truth is literal — the prose is the skill content.
No risk of the two diverging.
- Con: The skill body inherits the verbosity of human-readable prose.
Agent-facing content may become too long and unfocused, reducing skill
effectiveness.
**Option Y — Standalone duplicate block**: The SKILL block is a separate,
self-contained block that sits alongside the human-readable content (either in
the anchor file from Option A, or in each contributing file from Option B). The
block body is a concise summary/description written specifically for agent
consumption — intentionally different from the surrounding prose.
- Pro: The agent-facing content can be written more concisely and
imperatively without affecting the human-readable docs.
- Con: Explicit duplication. Maintainers must remember to update both the
prose and the SKILL block when the workflow changes.
Which approach aligns better with how you'd want contributors to maintain
these docs long-term?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]