-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/02/2010 06:00 PM, Scott Atchley wrote: > On Jun 2, 2010, at 11:52 AM, Scott Atchley wrote: > >> What if you explicitly disable MX? >> >> ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self --mca btl ^mx >> ~/bwlat/mpi_helloworld > > And can you try this as well? > > ~/openmpi-1.4.2-bin/bin/mpirun --mca btl openib,sm,self --mca pml ^cm > ~/bwlat/mpi_helloworld > > Thanks, > > Scott
of course I can :) the first command seems to be wrong, I had an error message: MCA framework parameters can only take a single negation operator ("^"), and it must be at the beginning of the value. The following value violates this rule: openib,sm,self,^mx I tried to put the options in reverse order: granquet@bordeplage-9 ~/openmpi-1.4.2 $ ~/openmpi-1.4.2-bin/bin/mpirun - --mca btl ^mx --mca btl openib,sm,self ~/bwlat/mpi_helloworld <snip> BTLs attempted: tcp </snip> I guess I got the commandline wrong, It seems I disabled everything but tcp. I then tried this: granquet@bordeplage-26 ~ $ ~/openmpi-1.4.2-bin/bin/mpirun --mca btl ^mx ~/bwlat/mpi_helloworld [bordeplage-26.bordeaux.grid5000.fr:03346] Error in mx_init (error No MX device entry in /dev.) Hello world from process 0 of 1 [bordeplage-26:03346] *** Process received signal *** [bordeplage-26:03346] Signal: Segmentation fault (11) [bordeplage-26:03346] Signal code: Address not mapped (1) [bordeplage-26:03346] Failing at address: 0x7fb51995b360 - -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 3346 on node bordeplage-26.bordeaux.grid5000.fr exited on signal 11 (Segmentation fault). - -------------------------------------------------------------------------- as I'm not doing anything in that helloworld, I just put self in there. granquet@bordeplage-26 ~ $ ~/openmpi-1.4.2-bin/bin/mpirun --mca btl self ~/bwlat/mpi_helloworld [bordeplage-26.bordeaux.grid5000.fr:03375] Error in mx_init (error No MX device entry in /dev.) Hello world from process 0 of 1 granquet@bordeplage-9 ~/openmpi-1.4.2 $ ~/openmpi-1.4.2-bin/bin/mpirun - --mca btl openib,sm,self --mca pml ^cm ~/bwlat/mpi_helloworld Hello world from process 0 of 1 granquet@bordeplage-9 ~/openmpi-1.4.2 $ I can tell it works :) > Ok, there is no segfault when it can't find IB. > > Which version of OMPI are you running on the IB nodes? 1.4.2? > > I can try to write a patch that does not alter the mpool if MX is not available. > > Scott the goal is to run the same version everywhere on every nodes (for the sake of simplicity). the current plans were targeting 1.4.1. I don't think our users would mind upgrading to 1.4.2. thanks for the help, much appreciated :) > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJMBpWFAAoJEEzIl7PMEAliyTYIALbBDyZbDBV0PUjzJ3HFG9Nx ihfhcygHf8Gt+nRpcFDaY8msyj0NSpPMyA9Mq0ljrGqw090z4srqF3WBFY/isxkj W9cjxURIlLrZsnTmd767lr1WQP3Mfg7UG6Ti3rt6CAl870efJtfC/Dz+H8+aoj28 X7EcUIqUcr137m5IXz2vsxfjlmgf4zmEkTA3veYJSpcdtMqv24gCQgu6o7LFNP4+ a9++/sIx9/xn4qInIyNOgQr2YedAKPP0+leHoLY6c/WTzKrOh/qV8fZOBc/Jf72l wov4VnLXk1MDozYt+/rY+3Jvmq0GpeISh1X4cYll01Mf+Zq0tnFOLoFSpUDjAU4= =EVxy -----END PGP SIGNATURE-----
smime.p7s
Description: S/MIME Cryptographic Signature