John Snow <js...@redhat.com> writes: > On Mon, Feb 17, 2025 at 7:13 AM Markus Armbruster <arm...@redhat.com> wrote: > >> John Snow <js...@redhat.com> writes: >> >> > This clarifies sections that are mistaken by the parser as "intro" >> > sections to be "details" sections instead. >> > >> > Signed-off-by: John Snow <js...@redhat.com> >> >> This is rather terse. >> > > Mea culpa. I can write more at length if we agree on the general approach. > For now, you got an RFC as this was the subject of a considerable amount of > controversy between us in the past ... so I am doing baby steps. > > "Commit message needs to be hit with the unterseification beam" added to > tasklist. :) > > >> >> Why does the boundary between "intro" (previously "body") and "details" >> matter? As far as I understand, it matters for inlining. >> > >> What is inlining? >> > >> The old doc generator emits "The members of T" into the argument >> description in the following cases: >> >> * When a command's arguments are given as a type T, the doc comment has >> no argument descriptions, and the generated argument description >> becomes "The members of T". >> >> * When an object type has a base type T, "The members of T" is appended >> to the doc comment's (possibly empty) argument descriptions. >> >> * For union types, "The members of T when TAG is VALUE" is appended to >> the doc comment's argument descriptions for every tag VALUE and >> associated type T. >> >> We want a description of the members of T right there instead. To get >> it right there, we need to inline from T's documentation. >> >> What exactly do we need to inline? Turns out we don't want "intro", we >> do want the argument descriptions and other stuff we can ignore here. >> >> "intro" ends before the argument descriptions, features, or a tagged >> section, whatever comes first. Most of the time, this works fine. But >> there are a few troublesome cases. Here's one: >> >> ## >> # @MemoryBackendShmProperties: >> # >> # Properties for memory-backend-shm objects. >> # >> # This memory backend supports only shared memory, which is the >> # default. >> # >> # Since: 9.1 >> ## >> { 'struct': 'MemoryBackendShmProperties', >> 'base': 'MemoryBackendProperties', >> 'data': { }, >> 'if': 'CONFIG_POSIX' } >> >> Everything up to "Since:" is "intro". Consequently, the old doc >> generator emits "The members of MemoryBackendProperties" right there: >> >> "MemoryBackendShmProperties" (Object) >> ------------------------------------- >> >> Properties for memory-backend-shm objects. >> >> This memory backend supports only shared memory, which is the default. >> >> >> Members >> ~~~~~~~ >> >> The members of "MemoryBackendProperties" >> >> Since >> ~~~~~ >> >> 9.1 >> >> >> If >> ~~ >> >> "CONFIG_POSIX" >> >> That's also where the new one inlines. Okay so far. >> >> This gets in turn inlined into ObjectOptions for branch >> memory-backend-shm. Since we don't inline "intro", we don't inline >> "This memory backend supports only shared memory, which is the default." >> That's a problem. >> > > Yes, this is all correct so far. > > >> >> This patch moves the boundary between "intro" and the remainder up that >> paragraph, so we don't lose that line. It accomplishes that by giving >> us syntax to manually mark the end of "intro" >> >> However, your solution is manual: it gives us the means[*] to mark the >> boundary with "Details:" to avoid loss of text. What if we don't >> notice? Should we tweak the syntax to force us to be explicit? How >> many doc comments would that affect? >> > > I'm leaving that question to you. The calculus I made was that there were > fewer SLOC changes to explicitly denote the "Details:" sections only in the > handful of cases where it was (potentially) relevant than to mandate its > use unconditionally.
How did you determine where it is (potentially) relevant? Oh, wait ... > If you have an idea that is enforceable at runtime and > has fewer SLOC changes, suggest away! > > Unseen in this patch is a warning I added to the /inliner/ that identified > potentially "ambiguous" delineation spots and issued a warning (error); the > exact code that did this is possibly a little hokey but it was what I used > to identify the spots addressed by this patch. ... that's how. > Point being: it's possible to enforce, but I enforced it in qapidoc.py in > the inliner instead of directly in the parser. We could discuss moving the > check to the parser if you'd like. The check itself is somewhat "dumb": > > - If a doc block has only one *paragraph* (knowingly/intentionally not > using the term section here) of text, it's assumed to be the intro. You mean if the "body" has just one paragraph, right? The "body" is the first section, always untagged, possibly empty. It's contains the text between the line naming the definition and the first tagged section. The tagged sections are member / argument descriptions, feature descriptions, 'Returns', 'Errors', 'Since', and 'TODO'. > - If a doc block has any number of tagged sections, all text above (if any) > is assumed to be the "intro" and all text below (if any) is assumed to be > "details". Uh, this can't be quite right. Consider: ## # @query-memory-size-summary: # # Return the amount of initially allocated and present hotpluggable # (if enabled) memory in bytes. # # .. qmp-example:: # # -> { "execute": "query-memory-size-summary" } # <- { "return": { "base-memory": 4294967296, "plugged-memory": 0 } } # --> # Since: 2.11 ## There is a tagged section. According to your explanation, the text above, i.e. everything between @query-memory-size-summary: and Since: is assumed to be "intro". According to your patch, which adds "Details:" in the middle, we do not assume this. Contradiction. > It's only in this case that it whines: > > - A doc block has *multiple paragraphs* of text at the start of the block, > but has no other sections and so if there is semantically a "details" > section or not is unclear to the parser and inliner. Let's take a step back. docs/devel/qapi-code-gen.rst: Definition documentation starts with a line naming the definition, followed by an optional overview, a description of each argument (for commands and events), member (for structs and unions), branch (for alternates), or value (for enums), a description of each feature (if any), and finally optional tagged sections. Bug: should be "finally optional tagged or untagged sections". Your generator wants all but 'Since' and 'TODO' together, so it can render them in a single two-column table. This description table separates "intro" (above) and "details" (below). Fair? Fine and dandy separation unless the description table is *empty*. Then the "body" (first section, always untagged) extends to the first 'Since', 'TODO', or the end of the doc comment. Heuristic: when this first untagged section is a single paragraph, we quietly assume it's "intro". If it's more than one, we ask the programmer to mark the end of "intro" explicitly. Let's see how this works out in practice. I stick if self.symbol and not (self.args or self.features or self.returns or self.errors): if self.body.text.find('\n\n') == -1: print(f"{self.info}: single para") else: print(f"{self.info}: ambiguous") into QAPIDoc.check(). The outer conditional is true for definition documentation (doc.symbol) where the table is empty (not ...). The inner conditional is a crude check for paragraphs. This reports 47 "single para" and 8 "ambiguous" in the main QAPI schema in master. Your patch hits 5 of 8 ambiguous ones, and throws in a 6th that doesn't seem to need it: ## # @query-yank: # # Query yank instances. See @YankInstance for more information. # # Returns: list of @YankInstance # # .. qmp-example:: # # -> { "execute": "query-yank" } # <- { "return": [ # { "type": "block-node", # "node-name": "nbd0" } # ] } # # Since: 6.0 ## It misses in run-state.json: ## # @SUSPEND_DISK: # # Emitted when guest enters a hardware suspension state with data # saved on disk, for example, S4 state, which is sometimes called # hibernate state # # .. note:: QEMU shuts down (similar to event @SHUTDOWN) when entering # this state. # # Since: 1.2 # # .. qmp-example:: # # <- { "event": "SUSPEND_DISK", # "timestamp": { "seconds": 1344456160, "microseconds": 309119 } } ## and in migration.json: ## # @migrate_cancel: # # Cancel the current executing migration process. # # .. note:: This command succeeds even if there is no migration # process running. # # Since: 0.14 # # .. qmp-example:: # # -> { "execute": "migrate_cancel" } # <- { "return": {} } ## and in machine.json ## # @HV_BALLOON_STATUS_REPORT: # # Emitted when the hv-balloon driver receives a "STATUS" message from # the guest. # # .. note:: This event is rate-limited. # # Since: 8.2 # # .. qmp-example:: # # <- { "event": "HV_BALLOON_STATUS_REPORT", # "data": { "committed": 816640000, "available": 3333054464 }, # "timestamp": { "seconds": 1600295492, "microseconds": 661044 } } ## > The check as I wrote it is unintelligent in that it does not bother to > check if the doc block it is checking is ever one that *could* be inlined; > i.e. it will complain about being unable to delineate for commands -- even > though it wouldn't really matter in that case. It's a potential improvement > to the algorithm to ignore cases where that "ambiguity" is not actually > important. The ambiguity affects both doc blocks the inliner inlines from and doc blocks the inliner inlines into. When inlining from, the inliner omits "intro", and therefore needs to know where "intro" ends. When inlining into, the inliner needs to know where to insert the inlined material. When the answer is "right after intro", it needs to know where "intro" ends. Getting the former wrong loses information. Getting the latter wrong may look funny, which is a lot less serious, but still useful to avoid. > But, it's possible to mechanically enforce and nudge documentation writers > to add the delineation marker where the parser is uncertain. > >> [*] Actually, we have means even before this patch, they're just ugly. >> See the TODO comment added in commit 14b48aaab92 (qapi: convert >> "Example" sections without titles) > > > That's right. This is merely a formalization of that hack: I add a > "section" that is intentionally empty and serves only as a marker to the > parser to begin recording a new section. Yes. Let's take a step back again. Recall the problem's cause is "empty description table". Can we enforce non-empty? Here's the table's syntactic structure: member / argument descriptions * ( "Features:" line feature descriptions ("features") + ) ? "Returns" section ? "Errors" section ? This is slightly more strict than what we actually accept now, but that's detail. Consider: "Members:" / "Arguments:" line member / argument descriptions * ( "Features:" line feature descriptions ("features") + ) ? "Returns" section ? "Errors" section ? With this, the table always starts with a "Members" / "Arguments" line, and thus cannot be empty. Drawback: we'd have to add this line to every single definition comment. The main QAPI schema has almost 1000. Tolerable? We could require it only when there are no member / argument descriptions. 55 instances. We could require it only when there are none, and our "one paragraph" heuristic for finding the end of "intro" fails. 8 instances. You might ask what the difference to your "Details:" proposal is. There are two. 1. The keyword(s). Matter of taste, best discussed last. 2. As coded, your patch accepts "Details:" almost[*] anywhere. "Members:" / "Arguments" would be accepted only where member / argument descriptions can go, i.e. not after feature descriptions etc. Consider: ## # @Enum: # # @one: The _one_ {and only}, description on the same line # # Features: # @enum-feat: Also _one_ {and only} # @enum-member-feat: a member feature # # Details: # # @two is undocumented ## This is accepted, and the "Details:" line gets swallowed. I figure tightening the position makes accidents slightly less likely. Here's another way to force non-empty: ( "Members: none" / "Arguments: none" line | member / argument descriptions * ) ( "Features:" line feature descriptions ("features") + ) ? "Returns" section ? "Errors" section ? This is similar to "require it only when there are no member / argument descriptions" above, except we also accept it only then. 55 instances. Syntax ideas better than "Members: none" are welcome. Thoughts? [*] Not after untagged sections following tagged ones.