Markus, Eric: Some commentary and additional information below.

I did not polish this script as I believe it's hacky enough that covering
all of the edge cases, testing and documentation is more effort than it's
worth, but I still signed off on it in case someone wanted to "adopt it".
My intent here is really just to advertise "Here's how I wrote that series"
and give you opportunities to spot problems with the programmatic
conversion before I send out my v2 so I can keep the email bombs to a
minimum.

My as-of-now-unsent v2 includes any additional instances located by this
version of the script, as well as one or two manual instances of the
ignored tokens that looked appropriate to convert.

Eric: Thank you for diving into the series, I appreciate it.

On Mon, Jun 16, 2025 at 5:16 PM John Snow <js...@redhat.com> wrote:

> This isn't really meant for inclusion as it's a bit of a hackjob, but I
> figured it would be best to share it in some form or another to serve as
> a basis for a kind of meta-review of the crossreferenceification series.
>
> This script is designed to convert 'name', "name", name, and @name
> instances in qapi/*.json files to `name` for the purposes of
> cross-referencing commands, events, and data types in the generated HTML
> documentation. It is specifically tuned for our QAPI files and is not
> suitable for running on generic rST source files. It can likely be made
> to operate on QEMU guest agent or other qapi JSON files with some edits
> to which files its opening.
>
> Navigate to your qemu/qapi/ directory and run this script with "python
> insert_crossrefs.py" and it'll handle the rest. Definitely don't run it
> in a non-git-controlled folder, it edits your source files.
>

Specifically, "python3 ../contrib/autoxref/insert_crossrefs.py"


>
> (Yes, in polishing this script, I found a few instances of
> cross-references I missed in my v1 series. I figure I'll let us discuss
> the conversion a bit before I send out a v2 patchbomb.)
>
> Signed-off-by: John Snow <js...@redhat.com>
>
---
>  contrib/autoxref/insert_crossrefs.py | 69 ++++++++++++++++++++++++++++
>  1 file changed, 69 insertions(+)
>  create mode 100644 contrib/autoxref/insert_crossrefs.py
>
> diff --git a/contrib/autoxref/insert_crossrefs.py
> b/contrib/autoxref/insert_crossrefs.py
> new file mode 100644
> index 00000000000..399dd7524c2
> --- /dev/null
> +++ b/contrib/autoxref/insert_crossrefs.py
> @@ -0,0 +1,69 @@
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +import os
> +import re
> +import sys
> +
> +if not os.path.exists("qapi-schema.json"):
> +    raise Exception(
> +        "This script was meant to be run from the qemu.git/qapi
> directory."
> +    )
> +sys.path.append("../scripts/")
> +
> +from qapi.schema import QAPISchema, QAPISchemaDefinition
> +
> +# Adjust this global to exclude certain tokens from being xreffed.
> +SKIP_TOKENS = ('String', 'stop', 'transaction', 'eject', 'migrate',
> 'quit')
>

At least *some* of these are still valid conversions, but the majority are
not. You can always comment out this line and review the diff in your
working tree to see what I mean.


> +
> +print("Compiling schema to build list of reference-able entities ...",
> end='')
> +tokens = []
> +schema = QAPISchema("qapi-schema.json")
> +for ent in schema._entity_list:
> +    if isinstance(ent, QAPISchemaDefinition) and not ent.is_implicit():
> +        if ent.name not in SKIP_TOKENS:
> +            tokens.append(ent.name)
> +print("OK")
> +
> +patt_names = r'(' + '|'.join(tokens) + r')'
> +
> +# catch 'token' and "token" specifically
> +patt = re.compile(r'([\'"]|``)' + patt_names + r'\1')
> +# catch naked instances of token, excluding those where prefixed or
> +# suffixed by a quote, dash, or word character. Exclude "@" references
> +# specifically to handle them elsewhere. Exclude <name> matches, as
> +# these are explicit cross-reference targets.
> +patt2 = r"(?<![-@`'\"\w<])" + patt_names + r"(?![-`'\"\w>])"
>

I'm quite aware this pattern doesn't match <token> specifically, because
the suffixes and prefixes are not contextually linked. Hacky. Got the job
done. Probably doesn't miss anything...


> +# catch @references. prohibit when followed by ":" to exclude members
> +# whose names happen to match xreffable entities.
> +patt3 = r"@" + patt_names + r"(?![-\w:])"
>

Excluding "@foo:" is also kludgy, but in manual review it didn't miss
anything.

I'm sure there's some big-brained way to not need three separate patterns,
but I refuse to learn regex any better than I already have so I have some
brain space left to admire flowers and birds.


> +
> +
> +
> +
> +for file in os.scandir():
> +    outlines = []
> +    if not file.name.endswith(".json"):
> +        continue
> +    print(f"Scanning {file.name} ...")
> +    with open(file.name) as searchfile:
> +        block_start = False
> +        for line in searchfile:
> +            # Don't mess with the start of doc blocks.
> +            # We don't want to convert "# @name:" to a reference!
> +            if block_start and line.startswith('# @'):
> +                outlines.append(line)
> +                continue
> +            block_start = bool(line.startswith('##'))
>

Similarly, I'm sure I could bake these ad-hoc conditions into the regexes
themselves, but it's harder and makes the expressions uglier. For a script
that only needs to be run once, whatever.


> +
> +            # Don't mess with anything outside of comment blocks,
> +            # and don't mess with example blocks. We use five spaces
> +            # as a heuristic for detecting example blocks. It's not
> perfect,
> +            # but it seemingly does the job well.
> +            if line.startswith('# ') and not line.startswith('#     '):
> +                line = re.sub(patt, r'`\2`', line)
> +                line = re.sub(patt2, r'`\1`', line)
> +                line = re.sub(patt3, r'`\1`', line)
> +            outlines.append(line)
> +    with open(file.name, "w") as outfile:
> +        for line in outlines:
> +            outfile.write(line)
> --
> 2.48.1


Thanks!

Reply via email to