On Fri, May 9, 2025, at 5:45 PM, Nikolaos Chatzikonstantinou wrote: > I rewrote GNU m4 in Python. Long story short, I wanted to learn m4 to > fix some issues I had with GNU Guile and Autotools, and after > realizing m4 1.4 is ~8000 lines of code and reading e.g. > <https://www.owlfolio.org/development/autoconf-swot/> which claims > "Feature gaps in GNU M4 hold back development of Autoconf." I thought > I'd rewrite it in Rust. (It turned out to be more beneficial to > rewrite in Python due to faster prototyping for the time being.) > Eventually I plan to get back to my original purpose of fixing the > integration of GNU Guile and Autotools.
This is a neat project! Thanks for tackling it, and for telling us about it. I'd like to draw your attention to the "foreach" macros, <https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/m4sugar/foreach.m4>. There are two versions of each of the macros defined in that file (one version is in that file and the other in m4sugar.m4), because recursion over m4's $@ is quadratically slow in GNU M4 1.4.x. If you haven't done anything clever about it, it's probably also quadratically slow in your implementation as well. (There's a development branch of GNU M4 in which this is fixed, but it's been gathering dust for almost 20 years now! The most immediately valuable thing anyone could do to GNU M4 to benefit Autoconf is to go through all the dusty development branches, sort the patches into "ready for release", "a good idea but not ready for release", and "not actually a good idea", kick a release of M4 1.5 out the door with the ready-for-release patches, and then clean up the remaining branches.) > 1. traceon, traceoff, changeword, debugmode, debugfile, dumpdef > 2. Some of the command-line options. Autoconf needs the tracing *mechanism*, and at least some of the debugging features as well; read over the code of autom4te and autoheader to get a feel for it. I don't know off the top of my head whether it needs the *macros* or just command line-driven tracing. You can probably assume that GNU M4 command line options that aren't used by autom4te are not needed by autoconf. changeword is definitely not necessary, and in fact AIUI considered a failed experiment, slated for removal from GNU M4 eventually. > 1. What mode GNU m4 opens files in; m4p always open in binary, > potentially treating carriage return differently on Windows. You should use Python's "universal newline" mode, not binary mode. You should *not*, however, assume UTF-8. I would suggest consistently opening files with `open(fname, "Dt", encoding="iso-8859-1")` where D is either 'r' or 'w' as appropriate. Python's "iso-8859-1" encoding is actually an identity map from bytes 0x00 .. 0xFF to U+0000 .. U+00FF (unlike the official ISO 8859.1) which makes it useful for passing through bytes with the 8th bit set without trying to interpret them. An *option* to process files as UTF-8 would be nice but we cannot have it on by default, I don't think. > 2. Sneaky bugs. The Autoconf testsuite is pretty thorough, but I don't know if it's thorough *enough* to validate a new M4 implementation. There are files `shell.nix` and `manifest.scm` at the top level of the source tree that set up testing environments for Autoconf in Nix and Guix, respectively. Any improvements to the test suite you can think of would be most welcome. Autoconf makes heavy use of diversions and m4wrap and is picky about how those interact with tracing. > I have not had any > benchmarks, but from roughly looking at > how long tests take I'm measuring a 100x slowdown. I'm hoping to > rewrite it in Rust later to address that. Before you start over from scratch a second time, take a hard look at PyPy and Cython. It may be possible to get performance parity with GNU M4 with minimal effort. I like Rust a lot myself, although I share some of the concerns expressed by other people, regarding language and ecosystem stability. However, you should be aware that anything Auto* depend on is necessarily very close to the bottom of the "bootstrap" dependency graph and therefore rewriting it in *any* language other than C is going to be a tough sell to distribution maintainers. Read through https://www.linuxfromscratch.org/lfs/view/stable/ to understand what the constraints are on anything that's needed prior to step 7 of the sequence that book describes. What might be really interesting is if you could fit your M4 implementation into the language the PyPy people call "RPython"; that would enable it to be *ahead-of-time* translated to C, and in turn that would make it possible to get a self-contained /usr/bin/m4 executable into the "temporary tools" environment the LFS book talks about, *without* needing to bring a Python interpreter or runtime libraries along. zw