On Fri, May 9, 2025, at 5:45 PM, Nikolaos Chatzikonstantinou wrote:
> I rewrote GNU m4 in Python. Long story short, I wanted to learn m4 to
> fix some issues I had with GNU Guile and Autotools, and after
> realizing m4 1.4 is ~8000 lines of code and reading e.g.
> <https://www.owlfolio.org/development/autoconf-swot/> which claims
> "Feature gaps in GNU M4 hold back development of Autoconf." I thought
> I'd rewrite it in Rust. (It turned out to be more beneficial to
> rewrite in Python due to faster prototyping for the time being.)
> Eventually I plan to get back to my original purpose of fixing the
> integration of GNU Guile and Autotools.

This is a neat project! Thanks for tackling it, and for telling us about it.

I'd like to draw your attention to the "foreach" macros,
<https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/m4sugar/foreach.m4>.
There are two versions of each of the macros defined in that file
(one version is in that file and the other in m4sugar.m4), because
recursion over m4's $@ is quadratically slow in GNU M4 1.4.x.
If you haven't done anything clever about it, it's probably also 
quadratically slow in your implementation as well.

(There's a development branch of GNU M4 in which this is fixed,
but it's been gathering dust for almost 20 years now! The most
immediately valuable thing anyone could do to GNU M4 to benefit
Autoconf is to go through all the dusty development branches,
sort the patches into "ready for release", "a good idea but
not ready for release", and "not actually a good idea",
kick a release of M4 1.5 out the door with the ready-for-release
patches, and then clean up the remaining branches.)

> 1. traceon, traceoff, changeword, debugmode, debugfile, dumpdef
> 2. Some of the command-line options.

Autoconf needs the tracing *mechanism*, and at least some of the
debugging features as well; read over the code of autom4te and
autoheader to get a feel for it.  I don't know off the top of my
head whether it needs the *macros* or just command line-driven
tracing.

You can probably assume that GNU M4 command line options that
aren't used by autom4te are not needed by autoconf.

changeword is definitely not necessary, and in fact AIUI considered
a failed experiment, slated for removal from GNU M4 eventually.

> 1. What mode GNU m4 opens files in; m4p always open in binary,
> potentially treating carriage return differently on Windows.

You should use Python's "universal newline" mode, not binary mode.
You should *not*, however, assume UTF-8.  I would suggest consistently
opening files with `open(fname, "Dt", encoding="iso-8859-1")` where
D is either 'r' or 'w' as appropriate.  Python's "iso-8859-1" encoding
is actually an identity map from bytes 0x00 .. 0xFF to U+0000 .. U+00FF
(unlike the official ISO 8859.1) which makes it useful for passing through
bytes with the 8th bit set without trying to interpret them.

An *option* to process files as UTF-8 would be nice but we cannot
have it on by default, I don't think.

> 2. Sneaky bugs.

The Autoconf testsuite is pretty thorough, but I don't know if it's
thorough *enough* to validate a new M4 implementation.  There are
files `shell.nix` and `manifest.scm` at the top level of the source
tree that set up testing environments for Autoconf in Nix and Guix,
respectively.  Any improvements to the test suite you can think of
would be most welcome.

Autoconf makes heavy use of diversions and m4wrap and is picky about
how those interact with tracing.

> I have not had any
> benchmarks, but from roughly looking at
> how long tests take I'm measuring a 100x slowdown. I'm hoping to
> rewrite it in Rust later to address that.

Before you start over from scratch a second time, take a hard look
at PyPy and Cython.  It may be possible to get performance parity
with GNU M4 with minimal effort.

I like Rust a lot myself, although I share some of the concerns expressed
by other people, regarding language and ecosystem stability.  However,
you should be aware that anything Auto* depend on is necessarily very
close to the bottom of the "bootstrap" dependency graph and therefore
rewriting it in *any* language other than C is going to be a tough sell to
distribution maintainers.  Read through 
https://www.linuxfromscratch.org/lfs/view/stable/
to understand what the constraints are on anything that's needed prior to
step 7 of the sequence that book describes.

What might be really interesting is if you could fit your M4 implementation
into the language the PyPy people call "RPython"; that would enable it to be
*ahead-of-time* translated to C, and in turn that would make it possible to
get a self-contained /usr/bin/m4 executable into the "temporary tools"
environment the LFS book talks about, *without* needing to bring a Python
interpreter or runtime libraries along.

zw

Reply via email to