On Wednesday 2022-10-05 23:24, Karl Berry wrote: > >What troubles me most is that there's no obvious way to debug any test >failure involving parallelism, since they go away with serial execution. >Any ideas about how to determine what is going wrong in the parallel >make? Any way to make parallel failures more reproducible?
1. Throw more processes in the mix (make -jN with more-than-normal N) so that either - for each (single) process the "critical section" execution time goes up - for the whole job set, the total time spent in/around critical sections goes up 2. determine which exact (sub-)program and syscall failed in what process in what job (strace), then construct a hypothesis around that failure 3. watch if any one job is somehow executed twice, or a file is written to concurrently foo: foo.c foo.h ld -o foo ... foo.c foo.h: generate_from_somewhere 3b. or a file is read and written to concurrently %.o: %.c generate_version.h cc -o $@ $< foo: foo.o bar.o (and foo.c, bar.c, nongenerated, have a #include "version.h") I've seen something like that in libtracefs commit b64dc07ca44ccfed40eae8d345867fd938ce6e0e