Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Jeff Squyres Tue, 1 Dec 2009 19:53:04 -0500

On Nov 29, 2009, at 6:15 PM, <kevin.buck...@ecs.vuw.ac.nz> <kevin.buck...@ecs.vuw.ac.nz> wrote:

$ mpirun -n 4 hello_f77
[somebox.ecs.vuw.ac.nz:04414] opal_ifinit: ioctl(SIOCGIFFLAGS)failed with
errno=6

Oy. This is ick, because this error code is coming from horrendouslycomplex code deep in the depths of OMPI that is probing the OS tofigure out what ethernet interfaces you have. It may or may not besimple to fix this.

Do you mind diving into the OMPI code a bit to figure this out? I'mafraid that none of the developers are likely to have access toNetBSD. :-( I can point you right where to look.

When running on a "server" machine within the grid, a machine I amtold
should not be any different to the workstation I was using above in
respect of user environment, I get a different error and find that the
job does not run at all.

This case seems to producean error message that is oft reported within
the OpenMPI community:

$ mpirun -n 4 hello_f77
[somebox2.ecs.vuw.ac.nz:25244] [[51186,0],0] ORTE_ERROR_LOG: Errorin file
ess_hnp_module.c at line 150
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallelprocess is
...

  orte_rml_base_select failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS

This could well be a side-effect of the same error as above -- OMPImay have concluded that it didn't find any ethernet devices andtherefore aborted.


--
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

Reply via email to