I looked a bit through the GUILE source code to see what is going on. I believe our current hypothesis (LilyPond's slowdown is caused by expensive unicode transcoding into 32-bit strings) is incorrect.
If you look into the source code, you can see that the UTF-8 -> SCM conversion checks if there are any code points over 255 https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/strings.c/?id=1b8e9ca0e37fab366435436995248abdfc780a10#n1620 if there aren't, it uses Latin1 encoding ("narrow == 1") to encode the string as a normal byte array. This code walks the string twice, but that is very cheap due to CPU cache locality, so it should be essentially equivalent to whatever GUILE 1.8 was doing. The conversion in the other direction is here: https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/strings.c/?id=1b8e9ca0e37fab366435436995248abdfc780a10#n2065 as you can see, if the string is narrow (Latin1/ASCII), it uses the cheap path as well. LilyPond internally doesn't use any Unicode strings, as all our identifiers are pure ascii, as well as internal strings (eg. font glyph names). This means that files that do not use Unicode characters at all should have the same overhead for strings as GUILE 1.8. Even so, if the input flie does use UTF-8, there should be little overhead, because the number of texts that we process is always small. LilyPond is not a text processor. So, what hard data do we have on GUILE 2/3 slowness, and what does that data say? -- Han-Wen Nienhuys - hanw...@gmail.com - http://www.xs4all.nl/~hanwen