Gus Correa wrote:

2) However, running with "sm" still breaks, unfortunately:

I get the same errors that I reported in my very first email, if I increase the number of processes to 16, to explore the hyperthreading range.

This is using "sm" (i.e. not excluded in the mca config file), and btl_sm_num_fifos (mpiexec command line)

The machine hangs, requires a hard reboot, etc, etc, as reported earlier.

Okay. I think this is different from trac 2043, then, since that involved a race condition that can be worked around by giving each sender its own FIFO.

So, I guess the conclusion is that I can use sm, but I have to remain within the range of physical cores (8), not oversubscribe, not try to explore the HT range. Should I expect it to work also for np>number of physical cores?

Yes, I believe that would be a reasonable expectation (under circumstances other than the ones you're facing, in any case). I just ran the examples/connectivity_c.c test with GCC on an 8-core Nehalem system with HT turned on and tested up to np=64.

Reply via email to