On Wednesday 2022-10-05 23:24, Karl Berry wrote:
>
>What troubles me most is that there's no obvious way to debug any test
>failure involving parallelism, since they go away with serial execution.
>Any ideas about how to determine what is going wrong in the parallel
>make?  Any way to make parallel failures more reproducible?

1. Throw more processes in the mix (make -jN with more-than-normal N)
   so that either
   - for each (single) process the "critical section" execution time goes up
   - for the whole job set, the total time spent in/around critical sections
     goes up

2. determine which exact (sub-)program and syscall failed in what process in
  what job (strace), then construct a hypothesis around that failure

3. watch if any one job is somehow executed twice, or a file is written to
   concurrently

   foo: foo.c foo.h
        ld -o foo ...
   foo.c foo.h:
        generate_from_somewhere

3b. or a file is read and written to concurrently

   %.o: %.c
     generate_version.h
     cc -o $@ $<

   foo: foo.o bar.o

(and foo.c, bar.c, nongenerated, have a #include "version.h")
I've seen something like that in libtracefs commit 
b64dc07ca44ccfed40eae8d345867fd938ce6e0e

Reply via email to