On Nov 29, 2009, at 6:15 PM, <kevin.buck...@ecs.vuw.ac.nz> <kevin.buck...@ecs.vuw.ac.nz > wrote:

$ mpirun -n 4 hello_f77
[somebox.ecs.vuw.ac.nz:04414] opal_ifinit: ioctl(SIOCGIFFLAGS) failed with
errno=6


Oy. This is ick, because this error code is coming from horrendously complex code deep in the depths of OMPI that is probing the OS to figure out what ethernet interfaces you have. It may or may not be simple to fix this.

Do you mind diving into the OMPI code a bit to figure this out? I'm afraid that none of the developers are likely to have access to NetBSD. :-( I can point you right where to look.

When running on a "server" machine within the grid, a machine I am told
should not be any different to the workstation I was using above in
respect of user environment, I get a different error and find that the
job does not run at all.

This case seems to producean error message that is oft reported within
the OpenMPI community:

$ mpirun -n 4 hello_f77
[somebox2.ecs.vuw.ac.nz:25244] [[51186,0],0] ORTE_ERROR_LOG: Error in file
ess_hnp_module.c at line 150
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
...

  orte_rml_base_select failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS


This could well be a side-effect of the same error as above -- OMPI may have concluded that it didn't find any ethernet devices and therefore aborted.

--
Jeff Squyres
jsquy...@cisco.com

Reply via email to