Hi Ralph,

This is the continuous story of "Segmentation fault in oob_tcp.c of
openmpi-1.7.4a1r29646".

I found the cause.

Firstly, I noticed that your hostfile can work and mine can not.

Your host file:
cat hosts
bend001 slots=12

My host file:
cat hosts
node08
node08
...(total 8 lines)

I modified my script file to add "slots=1" to each line of my hostfile
just before launching mpirun. Then it worked.

My host file(modified):
cat hosts
node08 slots=1
node08 slots=1
...(total 8 lines)

Secondary, I confirmed that there's a slight difference between
orte/util/hostfile/hostfile.c of 1.7.3 and that of 1.7.4a1r29646.

$ diff
hostfile.c.org ../../../../openmpi-1.7.3/orte/util/hostfile/hostfile.c
394,401c394,399
<     if (got_count) {
<         node->slots_given = true;
<     } else if (got_max) {
<         node->slots = node->slots_max;
<         node->slots_given = true;
<     } else {
<         /* should be set by obj_new, but just to be clear */
<         node->slots_given = false;
---
>     if (!got_count) {
>         if (got_max) {
>             node->slots = node->slots_max;
>         } else {
>             ++node->slots;
>         }
....

Finally, I added the line 402 below just as a tentative trial.
Then, it worked.

cat -n orte/util/hostfile/hostfile.c:
   ...
   394      if (got_count) {
   395          node->slots_given = true;
   396      } else if (got_max) {
   397          node->slots = node->slots_max;
   398          node->slots_given = true;
   399      } else {
   400          /* should be set by obj_new, but just to be clear */
   401          node->slots_given = false;
   402          ++node->slots; /* added by tmishima */
   403      }
   ...

Please fix the problem properly, because it's just based on my
random guess. It's related to the treatment of hostfile where slots
information is not given.

Regards,
Tetsuya Mishima

Reply via email to