Hello,

I have done some more experiments, looking at the `access' syscalls (the others are just the result of searching, I think). I have attached everything in a tarball.

On 2025-01-29T19:11:20+0100, Nicolas Goaziou via Bug reports for GNU Guix wrote:
Hello,

vicvbcun <g...@ikherbers.com> writes:

Consider the following example latex document:

--8<---------------cut here---------------start------------->8---
\documentclass{article}
        \usepackage{mathtools}

\begin{document}
        hello world
\end{document}
--8<---------------cut here---------------end--------------->8---

Compiling it with LuaLaTeX under strace in a shell with texlive-scheme-basic, texlive-collection-luatex and texlive-collection-latexextra, it seems like most of the time is spent recursively searching for input files:

--8<---------------cut here---------------start------------->8---
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  27.70    0.080138           2     30174           getdents64
  21.99    0.063605           4     15455       259 openat
  17.44    0.050460           3     16179        32 newfstatat
  14.37    0.041583           3     10440     10296 access
   8.42    0.024348           1     15196           close
   7.76    0.022456           1     15201           fstat
   0.79    0.002278           1      1868           write
--8<---------------cut here---------------end--------------->8---

and similarly for pdflatex.
Side note: While retrying the experiments, I found that these numbers must have been from a recompilation, with a clean directory are higher because it recursively searches for test.aux. I have tried being extra careful this time :).


As an extreme example, consider

--8<---------------cut here---------------start------------->8---
\documentclass{tudapub}

\begin{document}
        hello world
\end{document}
--8<---------------cut here---------------end--------------->8---

compiled with

--8<---------------cut here---------------start------------->8---
texlive-scheme-basic
texlive-collection-luatex
texlive-collection-latexextra
texlive-roboto texlive-urcls
texlive-xcharter
texlive-tuda-ci
--8<---------------cut here---------------end--------------->8---


This takes over 14 seconds (compared to about 2.7 seconds for lualatex
from Arch Linux) and from strace:

--8<---------------cut here---------------start------------->8---
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  32.60    5.926537           3   1801518           getdents64
  26.46    4.809462           5    900841       284 openat
  20.90    3.799744           4    896057    895349 access
  10.19    1.851520           2    900557           close
   9.49    1.724891           1    900575           fstat
   0.28    0.050743           2     17680       229 newfstatat
   0.04    0.007077           1      6073           read
--8<---------------cut here---------------end--------------->8---

Thank you for the report. I confirm the issue, unfortunately.

The cause for this seems to be kpathsea doesn't treat the ls-R database
as authoritative.  It is opened but kpathsea falls back to recursive
searching.

AFAIU, this should not happen. According to "The TeX Live Guide 2024":

 If a file is not found in the database, by default Kpathsea goes ahead
 and searches the disk. If a particular path element begins with ‘!!’,
 however, only the database will be searched for that element, never
 the disk.

IOW, even if the "!!" prefix is not there, Kpathsea should first look
for files in ls-R, and then on the disk. As you point out, it doesn’t
happen like this, and I don’t know why.

I think, it actually does work as advertised. I looked at the basename of all files that are access'ed in the minimal example I sent for both LuaLaTex from Guix and from Arch Linux. Comparing the logs (logs/minimal_vanilla.txt and logs/minimal_arch_vanilla.txt in the tarball):
--8<---------------cut here---------------start------------->8---
--- logs/minimal_vanilla.txt
+++ logs/minimal_arch_vanilla.txt
@@ -4 +3,0 @@
-      1 aliases                               -1
@@ -27,2 +25,0 @@
-      1 ls-R                                  0
-      1 ls-r                                  -1
@@ -284,0 +282 @@
+      3 texmf.cnf                             -1
@@ -286,0 +285 @@
+      4 aliases                               -1
@@ -290,0 +290,2 @@
+      4 ls-R                                  0
+      4 ls-r                                  -1
@@ -298,0 +300,2 @@
+     14 epstopdf.cfg                          -1
+     14 test.aux                              -1
@@ -306,2 +308,0 @@
-   9866 epstopdf.cfg                          -1
-   9866 test.aux                              -1
--8<---------------cut here---------------end--------------->8---
Where the first number is the number of times the file was tried to be access'ed and number at the end is -1 if the call failed and 0 if it succeeded. The only meaningful difference is for epstopdf.cfg and test.aux, both files that exist neither on Guix nor on Arch Linux (at least on first compilation for test.aux). The difference is that on Arch Linux LuaLaTeX only recursively searches the current directory and $TEXMFLOCAL while on Guix it recursively searches the entirety of $GUIX_TEXFM (i.e. $TEXMFDIST).

I also tried the opposite, stripping the !! from $TEXMF for LuaLaTeX on Arch Linux and the same problem appears (see logs/minimal_arch_texmf-override.txt, of course the actual numbers for the two files are higher as I have more packages installed).

So (un)fortunately, texlive-libkpathsea and !! seems to work as intended: Without !!, a positive entry in ls-R is used but the lack of an entry doesn't cut the search short, falling back to recursive searching.

Looking at the extreme example (logs/extreme_vanilla.txt), the main culprits for the recursive searches seem to be various .fontspec files and configuration files that don't exist.

In the package definition for texlive-libkpathsea, texmf.cnf is modified
such that the TEXMF variable is set without !! in front of
$TEXMFSYSCONFIG, $TEXMFSYSVAR and $TEXMFDIST.
If I override $TEXMF via --cnf-line like

--8<---------------cut here---------------start------------->8---
lualatex \
        --cnf-line='TEXMF =
        
{$TEXMFCONFIG,$TEXMFVAR,$TEXMFHOME,!!$TEXMFSYSCONFIG,!!$TEXMFSYSVAR,!!$TEXMFDIST}'
 \
        example.ltx
--8<---------------cut here---------------end--------------->8---

compilation time for the extreme example above falls to about 2.5
seconds, without excessive searching.

At least it proves our ls-R file is valid, at the expected location.
Just for the fun of it, I tried setting $TEXMFDBS to "{}" and it compilation time for the minimal example went from 0.9 to 9 seconds. I think there would have been more complaints if the ls-R didn't work at all :D.

The comment above the substitution says that the !! construct wouldn't
work for texlive-build-system or when building profiles.  I don't know
if it would be possible to work around this but perhaps it could be
possible to work around this if installed in profile (or environment)?

I don’t understand what you want to install in a profile. The ls-R file
is already built during profile generation. See "guix/profiles.scm".
What I meant was that we could maybe use a horrible hack like somehow overwriting texmf.cnf or wrapping the engines — anything to avoid rebuilding the world. But on a second thought, LaTeX should mostly be a build time dependency so that grafting with a version capable of handling both the build environment and being installed should work well, right? At least until the next TeX Live release.

Maybe we could keep "!!" prefix and create a ls-R file each time
`texlive-build-system' builds a package and every time
`texlive-updmap.cfg' is an input used to build documentation. In this
case I'm not sure about what should be done for packages propagating TeX
Live libraries without actually using them.
I think, that the best solution would be to somehow try to make !! work in the build environment but I'm unsure how. Perhaps the Nix folks have a solution for the problem?

In any case, this would require some experimentation. And it still is
a workaround for a problem we don’t understand yet.

Regards,
--
Nicolas Goaziou

vicvbcun

Attachment: texlive-kpathsea-debugging.tar.zst
Description: Binary data

Reply via email to