I usually do the following for running the ctests:

    ctest -j$( nproc )
    ctest --rerun-failed

This has behaved robustly for a while. However, I've recently gained access to a
server where I could run 64 jobs simultaneously, and now the above approach
shows problems, compared to results from the following:

    ctest -j1

By "problems" I mean that there are more failed tests in the first.

when I compare the user directories, located at

    /home/scott/lyxbuilds/master/CMakeBuild/Testing/.lyx

a key difference is in lyxrc.defaults.

The "good" lyxrc.defaults has the following at the end:

    \font_encoding "T1"

The "bad" does not. When I copy that line over, the test now succeeds.

When looking at the differences in configure.log, between the "bad" and the
"good" versions, the following difference appears:

    > DEBUG: Add to RC:
    > \font_encoding "T1"

The relevant part of configure.py that handles this is:

    # currently, values in chkconfig are only used to set
    # \font_encoding
    values = {}
    for line in open('chkconfig.vars').readlines():
        key, val = re.sub('-', '_', line).split('=')
        val = val.strip()
        values[key] = val.strip("'")
    # chk_fontenc may not exist
    try:
        addToRC(r'\font_encoding "%s"' % values["chk_fontenc"])

So the problem might be that chkconfig.vars does not exist at this point in the
code. configure.py removes chkconfig.vars when it is done with it. Thus, the
problem appears to be that there are two configure.py scripts running at the
same time. When the second one is running and creates chkconfig.vars, the first
one deletes it, and the second one thus does not read it. I cannot be sure that
is what's happening, but I can confirm that when I run 

    ctest -j$( nproc )

configure.py was run twice. The way I confirmed this was to add the following
code inside of configure.py, which creates a file with the PID in the filename:

    file = open("/home/scott/Desktop/skconfigure" + str(os.getpid()),"w")
    file.write("Hello World")
    file.close()

and two files show up.

Note that I cannot reproduce the above on a computer with 8 cores, no matter how
high I set the number of jobs. So to reproduce, one would need at least 20 cores
from my testing. Further, I cannot reproduce every time, which is not
surprising.

In summary, I have two questions:

1. Do we know why ctest -j$LARGE_NUMBER results in two calls to configure.py?

2. Although this problem only came up with a strange use case, do we want to
   make configure.py more robust to such situations? The answer might be "no"
   because I can't think of a common use case. One fix for this particular
   issue would be to read and write files with filenames such as
   chkconfig.vars_$PID, where $PID is the PID of the process. I believe this
   would fix the particular issue that I'm seeing (but I haven't checked).
   Another fix would be to just not remove those temporary files.

Any thoughts?

Scott

Attachment: signature.asc
Description: PGP signature

Reply via email to