Fixed and scheduled to move to 1.7.4. Thanks again!

On Nov 17, 2013, at 6:11 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Thanks! That's precisely where I was going to look when I had time :-)
> 
> I'll update tomorrow.
> Ralph
> 
> 
> 
> 
> On Sun, Nov 17, 2013 at 7:01 PM, <tmish...@jcity.maeda.co.jp> wrote:
> 
> 
> Hi Ralph,
> 
> This is the continuous story of "Segmentation fault in oob_tcp.c of
> openmpi-1.7.4a1r29646".
> 
> I found the cause.
> 
> Firstly, I noticed that your hostfile can work and mine can not.
> 
> Your host file:
> cat hosts
> bend001 slots=12
> 
> My host file:
> cat hosts
> node08
> node08
> ...(total 8 lines)
> 
> I modified my script file to add "slots=1" to each line of my hostfile
> just before launching mpirun. Then it worked.
> 
> My host file(modified):
> cat hosts
> node08 slots=1
> node08 slots=1
> ...(total 8 lines)
> 
> Secondary, I confirmed that there's a slight difference between
> orte/util/hostfile/hostfile.c of 1.7.3 and that of 1.7.4a1r29646.
> 
> $ diff
> hostfile.c.org ../../../../openmpi-1.7.3/orte/util/hostfile/hostfile.c
> 394,401c394,399
> <     if (got_count) {
> <         node->slots_given = true;
> <     } else if (got_max) {
> <         node->slots = node->slots_max;
> <         node->slots_given = true;
> <     } else {
> <         /* should be set by obj_new, but just to be clear */
> <         node->slots_given = false;
> ---
> >     if (!got_count) {
> >         if (got_max) {
> >             node->slots = node->slots_max;
> >         } else {
> >             ++node->slots;
> >         }
> ....
> 
> Finally, I added the line 402 below just as a tentative trial.
> Then, it worked.
> 
> cat -n orte/util/hostfile/hostfile.c:
>    ...
>    394      if (got_count) {
>    395          node->slots_given = true;
>    396      } else if (got_max) {
>    397          node->slots = node->slots_max;
>    398          node->slots_given = true;
>    399      } else {
>    400          /* should be set by obj_new, but just to be clear */
>    401          node->slots_given = false;
>    402          ++node->slots; /* added by tmishima */
>    403      }
>    ...
> 
> Please fix the problem properly, because it's just based on my
> random guess. It's related to the treatment of hostfile where slots
> information is not given.
> 
> Regards,
> Tetsuya Mishima
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

Reply via email to