[PATCH v2] wwwdocs: add a Python postprocessing script

David Malcolm Fri, 24 Jan 2025 12:20:58 -0800

Changed in v2: rather than replacing "mhc", this version runs the
output from mhc through the Python script.


I tested this via "MHC=cat", and the output appears identical to the
previous build I uploaded to:
  https://dmalcolm.fedorapeople.org/gcc/2025-01-15/htdocs/
You can see e.g. the easily clickable heading ids here:
  https://dmalcolm.fedorapeople.org/gcc/2025-01-15/htdocs/gcc-15/changes.html
compared to:
  https://gcc.gnu.org/gcc-15/porting_to.html

...though obviously this test build is missing the stuff that would
be added by mhc.

With this approach we could gradually move parts of the mhc
functionality into the python script, at whatever pace is comfortable.

Gerald: can you test this with mhc?  Alternatively, can I go ahead
and try pushing this?

Thanks
Dave


Blurb from v1:

The heading elements in our website contain "id" information,
but currently to find them you to look at the page source,
whereas in the generated HTML for the manual we have e.g.:

 <a class="copiable-link" href="#index-mabi-1"> &para;</a>

which shows up nicely in the browser in e.g.
  https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
as a pilcrow character when you hover over the link, which
you can then use to copy the URL to the clipboard.

It's *very* helpful to have easily shareable links to within pages.

The attached patch adds a postprocessing step to "bin" that
turns e.g.
  <h1 id="ID">TEXT</h1>
to:
  <h1 id="ID"><a href="#ID">TEXT</a></h1>

which makes it very easy to copy links in the generated website.

I didn't bother adding any CSS.
---
 bin/preprocess      |  8 ++++++++
 bin/process_html.py | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)
 create mode 100644 bin/process_html.py

diff --git a/bin/preprocess b/bin/preprocess
index 2e474b0c..397ede85 100755
--- a/bin/preprocess
+++ b/bin/preprocess
@@ -119,6 +119,14 @@ process_html_file()
         exit 1
     fi
 
+    # Run output.raw through a Python script, then overwrite it with the
+    # output from the script.
+    if ! python3 $SOURCETREE/bin/process_html.py $TMPDIR/output.raw 
$TMPDIR/output.after-py; then
+        echo "bin/process_html.py failed; aborting."
+        exit 1
+    fi
+    mv $TMPDIR/output.after-py $TMPDIR/output.raw
+
     # Use sed to work around makeinfo 4.7 brokenness.
     # Use sed to work around MetaHTML brokenness wrt. <DIV>.
     # Then remove leading blank lines and single line comments.
diff --git a/bin/process_html.py b/bin/process_html.py
new file mode 100644
index 00000000..8a36a587
--- /dev/null
+++ b/bin/process_html.py
@@ -0,0 +1,32 @@
+#! /usr/bin/python3
+#
+# Python 3 script to preprocess .html files below htdocs
+
+import re
+import sys
+
+input_path = sys.argv[1]
+output_path = sys.argv[2]
+
+with open(input_path) as f_in:
+    with open(output_path, 'w') as f_out:
+        for line in f_in:
+            # Convert from e.g.
+            #   <h1 id="ID">TEXT</h1>
+            # to:
+            #   <h1 id="ID"><a href="#ID">TEXT</a></h1>
+            for element_name in {'h1', 'h2', 'h3', 'h4'}:
+                pattern = \
+                    (r'<'
+                     + element_name
+                     + r' id="(.+)">(.+)</'
+                     + element_name
+                     + '>')
+                replacement = \
+                    (r'<'
+                     + element_name
+                     + r' id="\1"><a href="#\1">\2</a></'
+                     + element_name
+                     + '>')
+                line = re.sub(pattern, replacement, line)
+            f_out.write(line)
-- 
2.46.0

[PATCH v2] wwwdocs: add a Python postprocessing script

Reply via email to