I usually do the following for running the ctests: ctest -j$( nproc ) ctest --rerun-failed
This has behaved robustly for a while. However, I've recently gained access to a server where I could run 64 jobs simultaneously, and now the above approach shows problems, compared to results from the following: ctest -j1 By "problems" I mean that there are more failed tests in the first. when I compare the user directories, located at /home/scott/lyxbuilds/master/CMakeBuild/Testing/.lyx a key difference is in lyxrc.defaults. The "good" lyxrc.defaults has the following at the end: \font_encoding "T1" The "bad" does not. When I copy that line over, the test now succeeds. When looking at the differences in configure.log, between the "bad" and the "good" versions, the following difference appears: > DEBUG: Add to RC: > \font_encoding "T1" The relevant part of configure.py that handles this is: # currently, values in chkconfig are only used to set # \font_encoding values = {} for line in open('chkconfig.vars').readlines(): key, val = re.sub('-', '_', line).split('=') val = val.strip() values[key] = val.strip("'") # chk_fontenc may not exist try: addToRC(r'\font_encoding "%s"' % values["chk_fontenc"]) So the problem might be that chkconfig.vars does not exist at this point in the code. configure.py removes chkconfig.vars when it is done with it. Thus, the problem appears to be that there are two configure.py scripts running at the same time. When the second one is running and creates chkconfig.vars, the first one deletes it, and the second one thus does not read it. I cannot be sure that is what's happening, but I can confirm that when I run ctest -j$( nproc ) configure.py was run twice. The way I confirmed this was to add the following code inside of configure.py, which creates a file with the PID in the filename: file = open("/home/scott/Desktop/skconfigure" + str(os.getpid()),"w") file.write("Hello World") file.close() and two files show up. Note that I cannot reproduce the above on a computer with 8 cores, no matter how high I set the number of jobs. So to reproduce, one would need at least 20 cores from my testing. Further, I cannot reproduce every time, which is not surprising. In summary, I have two questions: 1. Do we know why ctest -j$LARGE_NUMBER results in two calls to configure.py? 2. Although this problem only came up with a strange use case, do we want to make configure.py more robust to such situations? The answer might be "no" because I can't think of a common use case. One fix for this particular issue would be to read and write files with filenames such as chkconfig.vars_$PID, where $PID is the PID of the process. I believe this would fix the particular issue that I'm seeing (but I haven't checked). Another fix would be to just not remove those temporary files. Any thoughts? Scott
signature.asc
Description: PGP signature