Gus Correa wrote:
2) However, running with "sm" still breaks, unfortunately:
I get the same errors that I reported in my very first email, if I
increase the number of processes to 16, to explore the hyperthreading
range.
This is using "sm" (i.e. not excluded in the mca config file), and
btl_sm_num_fifos (mpiexec command line)
The machine hangs, requires a hard reboot, etc, etc, as reported earlier.
Okay. I think this is different from trac 2043, then, since that
involved a race condition that can be worked around by giving each
sender its own FIFO.
So, I guess the conclusion is that I can use sm, but I have to remain
within the range of physical cores (8), not oversubscribe, not try to
explore the HT range. Should I expect it to work also for np>number
of physical cores?
Yes, I believe that would be a reasonable expectation (under
circumstances other than the ones you're facing, in any case). I just
ran the examples/connectivity_c.c test with GCC on an 8-core Nehalem
system with HT turned on and tested up to np=64.