Source: dolfin Followup-For: Bug #920546
I have a hunch the timeout problem might be related to oversubscription of CPUs in mpi runs. (in principle the same would apply to python MPI tests, presumeably the python/MPI interface would "slow down" messages enough to avoid the race condition) I've uploaded 2018.1.0.post1-18 to print the number of available CPUs at test time, to test if oversubscription is a plausible explanation. Currently oversubscription is permitted at up to 2 jobs per CPU. The demos use 3 processes each. So if 4 CPU are available then 2 jobs (6 processes) are run, which would be 50% oversubscribed. If that is the case and correlates with MPI C++ timeouts, then the next step is to strictly never oversubscribe (but if only 1 or 2 CPU is available then the first job of 3 processes must still be oversubscribed)