[PATCH] wwwdocs: experiments with a Python postprocessing script

David Malcolm Wed, 15 Jan 2025 14:35:51 -0800

The heading elements in our website contain "id" information,
but currently to find them you to look at the page source,
whereas in the generated HTML for the manual we have e.g.:


 <a class="copiable-link" href="#index-mabi-1"> &para;</a>

which shows up nicely in the browser in e.g.
  https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
as a pilcrow character when you hover over the link, which
you can then use to copy the URL to the clipboard.

It's *very* helpful to have easily shareable links to within pages.

The attached patch adds a postprocessing step to "bin" that
turns e.g.
  <h1 id="ID">TEXT</h1>
to:
  <h1 id="ID"><a href="#ID">TEXT</a></h1>

which makes it very easy to copy links in the generated website.

I didn't bother adding any CSS.

I've never managed to build MetaHTML and have always just
crossed my fingers and hoped when making edits to the GCC
website; bin/preprocess just errors out for me immediately
due to not finding mhc.

So this patch as written replaces the invocation of mhc with
an invocation of the python script, which of course drops
various features.

I've uploaded a build of the website with this to:
  https://dmalcolm.fedorapeople.org/gcc/2025-01-15/htdocs/

You can see e.g. the easily clickable heading ids here:
  https://dmalcolm.fedorapeople.org/gcc/2025-01-15/htdocs/gcc-15/changes.html

compared to:
  https://gcc.gnu.org/gcc-15/porting_to.html

and, for now, the loss of the mhc stuff here:
  https://dmalcolm.fedorapeople.org/gcc/2025-01-15/htdocs/

compared to:
  https://gcc.gnu.org/

Gerald: if you have mhc working, can you please try adjusting the
bin/ so it runs *both*. mhc and the python script.

Thoughts?

Dave
---
 bin/preprocess      | 13 +++----------
 bin/process_html.py | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 10 deletions(-)
 create mode 100644 bin/process_html.py

diff --git a/bin/preprocess b/bin/preprocess
index 2e474b0c..c64bc97b 100755
--- a/bin/preprocess
+++ b/bin/preprocess
@@ -33,8 +33,6 @@
 #
 # By Gerald Pfeifer <pfei...@dbai.tuwien.ac.at> 1999-12-29.
 
-MHC=${MHC-/usr/local/bin/mhc}
-
 SOURCETREE=${SOURCETREE-/www/gcc/htdocs-preformatted}
 DESTTREE=${DESTTREE-/www/gcc/htdocs}
 
@@ -114,9 +112,9 @@ process_html_file()
     printf '<set-var MHTML::INCLUDE-PREFIX="%s">\n' `pwd` >> $TMPDIR/input
     cat $f >> $TMPDIR/input
 
-    if ! ${MHC} $TMPDIR/input > $TMPDIR/output.raw; then
-        echo "${MHC} failed; aborting."
-        exit 1
+    if ! python3 $SOURCETREE/bin/process_html.py $TMPDIR/input 
$TMPDIR/output.raw; then
+        echo "bin/process_html.py failed; aborting."
+       exit 1
     fi
 
     # Use sed to work around makeinfo 4.7 brokenness.
@@ -227,11 +225,6 @@ shift `expr ${OPTIND} - 1`
 
 ## Various safety checks.
 
-if ! ${MHC} --version >/dev/null; then
-    echo "Something does not look right with \"${MHC}\"; aborting."
-    exit 1
-fi
-
 if [ ! -d $SOURCETREE ]; then
     echo "Source tree \"$SOURCETREE\" does not exist."
     exit 1
diff --git a/bin/process_html.py b/bin/process_html.py
new file mode 100644
index 00000000..8a36a587
--- /dev/null
+++ b/bin/process_html.py
@@ -0,0 +1,32 @@
+#! /usr/bin/python3
+#
+# Python 3 script to preprocess .html files below htdocs
+
+import re
+import sys
+
+input_path = sys.argv[1]
+output_path = sys.argv[2]
+
+with open(input_path) as f_in:
+    with open(output_path, 'w') as f_out:
+        for line in f_in:
+            # Convert from e.g.
+            #   <h1 id="ID">TEXT</h1>
+            # to:
+            #   <h1 id="ID"><a href="#ID">TEXT</a></h1>
+            for element_name in {'h1', 'h2', 'h3', 'h4'}:
+                pattern = \
+                    (r'<'
+                     + element_name
+                     + r' id="(.+)">(.+)</'
+                     + element_name
+                     + '>')
+                replacement = \
+                    (r'<'
+                     + element_name
+                     + r' id="\1"><a href="#\1">\2</a></'
+                     + element_name
+                     + '>')
+                line = re.sub(pattern, replacement, line)
+            f_out.write(line)
-- 
2.46.0

[PATCH] wwwdocs: experiments with a Python postprocessing script

Reply via email to