Hello,I have done some more experiments, looking at the `access' syscalls (the others are just the result of searching, I think). I have attached everything in a tarball.
On 2025-01-29T19:11:20+0100, Nicolas Goaziou via Bug reports for GNU Guix wrote:
Side note: While retrying the experiments, I found that these numbers must have been from a recompilation, with a clean directory are higher because it recursively searches for test.aux. I have tried being extra careful this time :).Hello, vicvbcun <g...@ikherbers.com> writes:Consider the following example latex document: --8<---------------cut here---------------start------------->8--- \documentclass{article} \usepackage{mathtools} \begin{document} hello world \end{document} --8<---------------cut here---------------end--------------->8---Compiling it with LuaLaTeX under strace in a shell with texlive-scheme-basic, texlive-collection-luatex and texlive-collection-latexextra, it seems like most of the time is spent recursively searching for input files:--8<---------------cut here---------------start------------->8--- % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 27.70 0.080138 2 30174 getdents64 21.99 0.063605 4 15455 259 openat 17.44 0.050460 3 16179 32 newfstatat 14.37 0.041583 3 10440 10296 access 8.42 0.024348 1 15196 close 7.76 0.022456 1 15201 fstat 0.79 0.002278 1 1868 write --8<---------------cut here---------------end--------------->8--- and similarly for pdflatex.
I think, it actually does work as advertised. I looked at the basename of all files that are access'ed in the minimal example I sent for both LuaLaTex from Guix and from Arch Linux. Comparing the logs (logs/minimal_vanilla.txt and logs/minimal_arch_vanilla.txt in the tarball):As an extreme example, consider --8<---------------cut here---------------start------------->8--- \documentclass{tudapub} \begin{document} hello world \end{document} --8<---------------cut here---------------end--------------->8--- compiled with --8<---------------cut here---------------start------------->8--- texlive-scheme-basic texlive-collection-luatex texlive-collection-latexextra texlive-roboto texlive-urcls texlive-xcharter texlive-tuda-ci --8<---------------cut here---------------end--------------->8--- This takes over 14 seconds (compared to about 2.7 seconds for lualatex from Arch Linux) and from strace: --8<---------------cut here---------------start------------->8--- % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 32.60 5.926537 3 1801518 getdents64 26.46 4.809462 5 900841 284 openat 20.90 3.799744 4 896057 895349 access 10.19 1.851520 2 900557 close 9.49 1.724891 1 900575 fstat 0.28 0.050743 2 17680 229 newfstatat 0.04 0.007077 1 6073 read --8<---------------cut here---------------end--------------->8---Thank you for the report. I confirm the issue, unfortunately.The cause for this seems to be kpathsea doesn't treat the ls-R database as authoritative. It is opened but kpathsea falls back to recursive searching.AFAIU, this should not happen. According to "The TeX Live Guide 2024": If a file is not found in the database, by default Kpathsea goes ahead and searches the disk. If a particular path element begins with ‘!!’, however, only the database will be searched for that element, never the disk. IOW, even if the "!!" prefix is not there, Kpathsea should first look for files in ls-R, and then on the disk. As you point out, it doesn’t happen like this, and I don’t know why.
--8<---------------cut here---------------start------------->8--- --- logs/minimal_vanilla.txt +++ logs/minimal_arch_vanilla.txt @@ -4 +3,0 @@ - 1 aliases -1 @@ -27,2 +25,0 @@ - 1 ls-R 0 - 1 ls-r -1 @@ -284,0 +282 @@ + 3 texmf.cnf -1 @@ -286,0 +285 @@ + 4 aliases -1 @@ -290,0 +290,2 @@ + 4 ls-R 0 + 4 ls-r -1 @@ -298,0 +300,2 @@ + 14 epstopdf.cfg -1 + 14 test.aux -1 @@ -306,2 +308,0 @@ - 9866 epstopdf.cfg -1 - 9866 test.aux -1 --8<---------------cut here---------------end--------------->8---Where the first number is the number of times the file was tried to be access'ed and number at the end is -1 if the call failed and 0 if it succeeded. The only meaningful difference is for epstopdf.cfg and test.aux, both files that exist neither on Guix nor on Arch Linux (at least on first compilation for test.aux). The difference is that on Arch Linux LuaLaTeX only recursively searches the current directory and $TEXMFLOCAL while on Guix it recursively searches the entirety of $GUIX_TEXFM (i.e. $TEXMFDIST).
I also tried the opposite, stripping the !! from $TEXMF for LuaLaTeX on Arch Linux and the same problem appears (see logs/minimal_arch_texmf-override.txt, of course the actual numbers for the two files are higher as I have more packages installed).
So (un)fortunately, texlive-libkpathsea and !! seems to work as intended: Without !!, a positive entry in ls-R is used but the lack of an entry doesn't cut the search short, falling back to recursive searching.
Looking at the extreme example (logs/extreme_vanilla.txt), the main culprits for the recursive searches seem to be various .fontspec files and configuration files that don't exist.
Just for the fun of it, I tried setting $TEXMFDBS to "{}" and it compilation time for the minimal example went from 0.9 to 9 seconds. I think there would have been more complaints if the ls-R didn't work at all :D.In the package definition for texlive-libkpathsea, texmf.cnf is modified such that the TEXMF variable is set without !! in front of $TEXMFSYSCONFIG, $TEXMFSYSVAR and $TEXMFDIST. If I override $TEXMF via --cnf-line like --8<---------------cut here---------------start------------->8--- lualatex \ --cnf-line='TEXMF = {$TEXMFCONFIG,$TEXMFVAR,$TEXMFHOME,!!$TEXMFSYSCONFIG,!!$TEXMFSYSVAR,!!$TEXMFDIST}' \ example.ltx --8<---------------cut here---------------end--------------->8--- compilation time for the extreme example above falls to about 2.5 seconds, without excessive searching.At least it proves our ls-R file is valid, at the expected location.
What I meant was that we could maybe use a horrible hack like somehow overwriting texmf.cnf or wrapping the engines — anything to avoid rebuilding the world. But on a second thought, LaTeX should mostly be a build time dependency so that grafting with a version capable of handling both the build environment and being installed should work well, right? At least until the next TeX Live release.The comment above the substitution says that the !! construct wouldn't work for texlive-build-system or when building profiles. I don't know if it would be possible to work around this but perhaps it could be possible to work around this if installed in profile (or environment)?I don’t understand what you want to install in a profile. The ls-R file is already built during profile generation. See "guix/profiles.scm".
I think, that the best solution would be to somehow try to make !! work in the build environment but I'm unsure how. Perhaps the Nix folks have a solution for the problem?Maybe we could keep "!!" prefix and create a ls-R file each time `texlive-build-system' builds a package and every time `texlive-updmap.cfg' is an input used to build documentation. In this case I'm not sure about what should be done for packages propagating TeX Live libraries without actually using them.
In any case, this would require some experimentation. And it still is a workaround for a problem we don’t understand yet. Regards, -- Nicolas Goaziou
vicvbcun
texlive-kpathsea-debugging.tar.zst
Description: Binary data