Hello,
In this thread from the Open-MPI archives:
https://www.open-mpi.org/community/lists/devel/2014/03/14416.php
a strange problem with a system call is discussed, and claimed to be
solved. However, in running a simple test program with some new MPI-3
functions, the problem seems to be back: Here is an example message:
-
A system call failed during shared memory initialization that should
not have. It is likely that your MPI job will now either abort or
experience performance degradation.
Local host: cedar.reachone.com
System call: unlink(2)
/tmp/openmpi-sessions-rosmond@cedar_0/18624/1/shared_window_4.cedar
Error: No such file or directory (errno 2)
The same problem occurs on 2 different systems:
1. (Open MPI) 1.10.2 : gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
2. (Open MPI) 1.8.4 : gcc version 4.7.2 (Debian 4.7.2-5)
Attached are
1. ompi_info from the first system above
2. The source code of the test program (based on code downloaded from an
Intel source)
I simply do
mpicc shared_mpi.c
mpirun -np 8 a.out > outp
The program runs correctly on both system using Intel MPI, but returns
the messages above with Open MPI
T. Rosmond
ompi_info_output.gz
Description: GNU Zip compressed data
//===
//
// SAMPLE SOURCE CODE - SUBJECT TO THE TERMS OF SAMPLE CODE LICENSE AGREEMENT
//
http://software.intel.com/en-us/articles/intel-sample-source-code-license-agreement/
//
// Copyright 2009 Intel Corporation
//
// THIS FILE IS PROVIDED "AS IS" WITH NO WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT
// NOT LIMITED TO ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR
// PURPOSE, NON-INFRINGEMENT OF INTELLECTUAL PROPERTY RIGHTS.
//
//
#include "mpi.h"
#include
#include
#include
/*
* mpi3shm_1Dring.c can be used for the basic functionality testing of MPI-3
shared memory
* in multi-node enviroment (such as Xeon and Xeon Phi based clusters).
*
* Each rank exhanges hello world info (rank, total number of ranks and node
name)
* with its 2 neighbours (partners) in 1D ring topology under periodic boundary
conditions.
*
* mpi3shm_1Dring.c serves as a prototype for MPI-3 shm addition to MPPtest
halo code.
*
* functions:
* get_n_partners -- gets number of intra- and inter- node partners
* translate_ranks -- defines global rank -> shm commmunicator rank mapping
* get_partners_ptrs -- returns pointers to mem windows
* main
*
* Example of compilation and usage:
* mpiicc -o mpi3shm_1Dring mpi3shm_1Dring.c
* mpiicc -mmic -o mpi3shm_1Dring.mic mpi3shm_1Dring.c
* export I_MPI_MIC=enable
* export I_MPI_MIC_POSTFIX=.mic
* mpirun -l -bootstrap ssh -n 112 -machinefile hostfile ./mpi3shm_1Dring
* where hostfile:
* esg065:24
* esg065-mic0:32
* esg066:24
* esg066-mic0:32
*/
int const n_partners = 2; /* size of partners array in 1D-ring topology*/
/* count number of intra and inter node partners */
void get_n_partners (int rank, int partners[], int partners_map[],
int *n_node_partners, int *n_inter_partners)
{
int j;
for (j=0; j shmcomm rank mapping;
output: partners_map is array of ranks in shmcomm */
void translate_ranks(MPI_Comm shmcomm, int partners[], int partners_map[])
{
MPI_Group world_group, shared_group;
/* create MPI groups for global communicator and shm communicator */
MPI_Comm_group (MPI_COMM_WORLD, &world_group);
MPI_Comm_group (shmcomm, &shared_group);
MPI_Group_translate_ranks (world_group, n_partners, partners, shared_group,
partners_map);
}
/* returns pointers to mem windows partners_ptrs */
void get_partners_ptrs(MPI_Win win, int partners[], int partners_map[], int
**partners_ptrs )
{
int j, dsp_unit;
MPI_Aint sz;
for (j=0; j shmcomm rank is in partners_map */
partners_map = (int*)malloc(n_partners*sizeof(int)); /* allocate
partners_map */
translate_ranks(shmcomm, partners, partners_map);
/* number of inter and intra node partners */
get_n_partners (rank, partners, partners_map, &n_node_partners,
&n_inter_partners);
printf( "parts %d, %d, %d\n", rank,n_node_partners,n_inter_partners);
alloc_len = sizeof(int);
if (n_node_partners > 0)
{
/* allocate shared memory windows on each node for intra-node partners */
MPI_Win_allocate_shared (alloc_len, 1, MPI_INFO_NULL, shmcomm, /* inputs
to MPI-3 SHM collective */
&mem, &win); /* outputs: mem - initial address
of window; win - window object */
/* pointers to mem windows */
partners_ptrs = (int **)malloc(n_partners*sizeof(int*));
get_p