MichaelChenGithub commented on issue #62500:
URL: https://github.com/apache/airflow/issues/62500#issuecomment-4129118761

   Hi @jason810496 @potiuk — while building the PoC I ran into three design 
questions I'd love your input on.
   
   ---
   
   **Q1 — AGENTS.md vs contributing-docs: which takes precedence when they 
conflict?**
   
   While implementing contributing-docs as the single source of truth, I found 
that `AGENTS.md` and the contributing-docs describe the same workflows 
differently. Specifically, the `uv run --project <PROJECT> pytest` pattern 
(running from repo root with the `--project` flag) only appears in `AGENTS.md` 
— the contributing-docs (`contributing-docs/testing/unit_tests.rst`, 
`contributing-docs/07_local_virtualenv.rst`) only show `uv run pytest` after 
`cd`-ing into a subdirectory.
   
   When there's a conflict like this, which source should the skill body 
follow? And is the correct fix to first update contributing-docs to include the 
`--project` pattern (so AGENTS.md and contributing-docs are consistent) before 
embedding the SKILL block?
   
   ---
   
   **Q2 — Single anchor vs multi-chunk: which design do you prefer for 
long-term maintainability?**
   
   For a skill like `airflow-run-pytest`, the relevant knowledge is spread 
across multiple RST files. When designing the SKILL block embedding, there are 
two general approaches:
   
   **Option A — Single anchor file, synthesized body**
   One RST file is chosen as the canonical anchor (e.g. 
`contributing-docs/testing/unit_tests.rst`). A human synthesizes all relevant 
knowledge from across the files into that one SKILL block body.
   - Pro: Simple generator — one block = one skill
   - Con: When someone updates one of the other source files, they have to know 
to also update the SKILL block in a *different* file. Drift is almost 
guaranteed over time.
   
   **Option B — Multi-chunk, merged by generator**
   Each RST file contributes a tagged chunk for a skill. The generator collects 
all chunks for a skill ID and assembles them in section order:
   ```
   .. SKILL-CHUNK skill=airflow-run-pytest section=1-how-to-run
      body: |
        ## How to run tests
        ...
   .. SKILL-CHUNK-END
   ```
   - Pro: Each RST file owns its contribution. When you touch that file, the 
relevant chunk is right there. Drift resistance is real.
   - Con: Generator gets more complex. Requires explicit section ordering 
across files.
   
   Which approach do you think is better for long-term maintainability across 
the codebase?
   
   ---
   
   **Q3 — Tag-based rendering vs standalone duplicate block?**
   
   When embedding skill content in the RST, should the approach be:
   
   **Option X — Tag-based (inline)**: The SKILL block tags an *existing* 
section of the RST. The generator renders that tagged content into the SKILL.md 
body.
   - Pro: Single source of truth is literal — the prose is the skill content. 
No risk of the two diverging.
   - Con: The skill body inherits the verbosity of human-readable prose. 
Agent-facing content may become too long and unfocused, reducing skill 
effectiveness.
   
   **Option Y — Standalone duplicate block**: The SKILL block is a separate, 
self-contained block that sits alongside the human-readable content (either in 
the anchor file from Option A, or in each contributing file from Option B). The 
block body is a concise summary/description written specifically for agent 
consumption — intentionally different from the surrounding prose.
   - Pro: The agent-facing content can be written more concisely and 
imperatively without affecting the human-readable docs.
   - Con: Explicit duplication. Maintainers must remember to update both the 
prose and the SKILL block when the workflow changes.
   
   Which approach aligns better with how you'd want contributors to maintain 
these docs long-term? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to