Hi Nick use master not 1.8.x. for cray xc. also for config do not pay attention to cray/lanl platform files. just do config. also if using nativized slurm launch with srun not mpirun.
howard ---------- sent from my smart phonr so no good type. Howard On Jun 25, 2015 2:56 PM, "Nick Radcliffe" <nradc...@cray.com> wrote: > Hi, > > I'm trying to build and run Open MPI 1.8.5 with native ugni on a Cray XC. > The build works, but I'm getting this error when I run: > > nradclif@kay:/lus/scratch/nradclif> aprun -n 2 -N 1 ./osu_latency > [nid00014:28784] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation > failed > [nid00014:28784] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed > [nid00012:12788] [db_pmi.c:174:pmi_commit_packed] PMI_KVS_Put: Operation > failed > [nid00012:12788] [db_pmi.c:457:commit] PMI_KVS_Commit: Operation failed > # OSU MPI Latency Test > # Size Latency (us) > osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: > Assertion `0' failed. > [nid00012:12788] *** Process received signal *** > [nid00012:12788] Signal: Aborted (6) > [nid00012:12788] Signal code: (-6) > [nid00012:12788] [ 0] /lib64/libpthread.so.0(+0xf850)[0x2aaaab42b850] > [nid00012:12788] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2aaaab66b885] > [nid00012:12788] [ 2] /lib64/libc.so.6(abort+0x181)[0x2aaaab66ce61] > [nid00012:12788] [ 3] /lib64/libc.so.6(__assert_fail+0xf0)[0x2aaaab664740] > [nid00012:12788] [ 4] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_ep_connect_progress+0x6c9)[0x2aaaaaff9869] > [nid00012:12788] [ 5] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x5ae32)[0x2aaaaaf46e32] > [nid00012:12788] [ 6] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_btl_ugni_sendi+0x8bd)[0x2aaaaaffaf7d] > [nid00012:12788] [ 7] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(+0x1f0c17)[0x2aaaab0dcc17] > [nid00012:12788] [ 8] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(mca_pml_ob1_isend+0xa8)[0x2aaaab0dd488] > [nid00012:12788] [ 9] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(ompi_coll_tuned_barrier_intra_two_procs+0x11b)[0x2aaaab07e84b] > [nid00012:12788] [10] > /lus/scratch/nradclif/openmpi_install/lib/libmpi.so.1(PMPI_Barrier+0xb6)[0x2aaaaaf8a7c6] > [nid00012:12788] [11] ./osu_latency[0x401114] > [nid00012:12788] [12] > /lib64/libc.so.6(__libc_start_main+0xe6)[0x2aaaab657c36] > [nid00012:12788] [13] ./osu_latency[0x400dd9] > [nid00012:12788] *** End of error message *** > osu_latency: btl_ugni_endpoint.c:87: mca_btl_ugni_ep_connect_start: > Assertion `0' failed. > > > Here's how I build: > > export FC=ftn (I'm not using Fortran, but the configure fails if > it can't find a Fortran compiler) > ./configure --prefix=/lus/scratch/nradclif/openmpi_install > --enable-mpi-fortran=none > --with-platform=contrib/platform/lanl/cray_xe6/debug-lustre > make install > > I didn't modify the debug-lustre file, but I did change cray-common to > remove the hard-coding, e.g., rather than using the gemini-specific path > "with_pmi=/opt/cray/pmi/2.1.4-1.0000.8596.8.9.gem", I used > "with_pmi=/opt/cray/pmi/default". > > I've tried running different executables with different numbers of > ranks/nodes, but they all seem to run into problems with PMI_KVS_Put. > > Any ideas what could be going wrong? > > Thanks for any help, > Nick > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27197.php >