On Nov 29, 2009, at 6:15 PM, <kevin.buck...@ecs.vuw.ac.nz> <kevin.buck...@ecs.vuw.ac.nz
> wrote:
$ mpirun -n 4 hello_f77
[somebox.ecs.vuw.ac.nz:04414] opal_ifinit: ioctl(SIOCGIFFLAGS)
failed with
errno=6
Oy. This is ick, because this error code is coming from horrendously
complex code deep in the depths of OMPI that is probing the OS to
figure out what ethernet interfaces you have. It may or may not be
simple to fix this.
Do you mind diving into the OMPI code a bit to figure this out? I'm
afraid that none of the developers are likely to have access to
NetBSD. :-( I can point you right where to look.
When running on a "server" machine within the grid, a machine I am
told
should not be any different to the workstation I was using above in
respect of user environment, I get a different error and find that the
job does not run at all.
This case seems to producean error message that is oft reported within
the OpenMPI community:
$ mpirun -n 4 hello_f77
[somebox2.ecs.vuw.ac.nz:25244] [[51186,0],0] ORTE_ERROR_LOG: Error
in file
ess_hnp_module.c at line 150
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel
process is
...
orte_rml_base_select failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
This could well be a side-effect of the same error as above -- OMPI
may have concluded that it didn't find any ethernet devices and
therefore aborted.
--
Jeff Squyres
jsquy...@cisco.com