[OMPI users] system call failed that shouldn't?

2016-04-13 Thread Tom Rosmond

Hello,

In this thread from the Open-MPI archives:

https://www.open-mpi.org/community/lists/devel/2014/03/14416.php

a strange problem with a system call is discussed, and claimed to be 
solved.  However, in running a simple test program with some new MPI-3 
functions, the problem seems to be back:  Here is an example message:


-

A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  cedar.reachone.com
  System call: unlink(2) 
/tmp/openmpi-sessions-rosmond@cedar_0/18624/1/shared_window_4.cedar

  Error:   No such file or directory (errno 2)



The same problem occurs on 2 different systems:

1. (Open MPI) 1.10.2 : gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)

2. (Open MPI) 1.8.4 : gcc version 4.7.2 (Debian 4.7.2-5)

Attached are
1.  ompi_info from the first system above
2. The source code of the test program (based on code downloaded from an 
Intel source)


I simply do

mpicc shared_mpi.c
mpirun -np 8 a.out > outp

The program runs correctly on both system using Intel MPI, but returns 
the messages above with Open MPI


T. Rosmond




ompi_info_output.gz
Description: GNU Zip compressed data
//===
//
// SAMPLE SOURCE CODE - SUBJECT TO THE TERMS OF SAMPLE CODE LICENSE AGREEMENT
// 
http://software.intel.com/en-us/articles/intel-sample-source-code-license-agreement/
//
// Copyright 2009 Intel Corporation
//
// THIS FILE IS PROVIDED "AS IS" WITH NO WARRANTIES, EXPRESS OR IMPLIED, 
INCLUDING BUT
// NOT LIMITED TO ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A 
PARTICULAR
// PURPOSE, NON-INFRINGEMENT OF INTELLECTUAL PROPERTY RIGHTS.
//
// 

#include "mpi.h"
#include 
#include 
#include 

/*
 * mpi3shm_1Dring.c can be used for the basic functionality testing of MPI-3 
shared memory
 * in multi-node enviroment (such as Xeon and Xeon Phi based clusters).
 *
 * Each rank exhanges hello world info (rank, total number of ranks and node 
name) 
 * with its 2 neighbours (partners) in 1D ring topology under periodic boundary 
conditions.
 *
 * mpi3shm_1Dring.c serves as a prototype for MPI-3 shm addition to MPPtest 
halo code. 
 *
 * functions:
 *   get_n_partners   -- gets number of intra- and inter- node partners
 *   translate_ranks  -- defines global rank  -> shm commmunicator rank mapping
 *   get_partners_ptrs -- returns pointers to mem windows
 *   main
 *
 *   Example of compilation and usage:
 *   mpiicc -o mpi3shm_1Dring mpi3shm_1Dring.c
 *   mpiicc -mmic -o mpi3shm_1Dring.mic mpi3shm_1Dring.c
 *   export I_MPI_MIC=enable
 *   export I_MPI_MIC_POSTFIX=.mic
 *   mpirun -l -bootstrap ssh -n 112 -machinefile hostfile  ./mpi3shm_1Dring
 *   where hostfile:
 *   esg065:24
 *   esg065-mic0:32
 *   esg066:24
 *   esg066-mic0:32
*/

  int const n_partners = 2; /* size of partners array in 1D-ring topology*/

  /* count number of intra and inter node partners */
  void get_n_partners (int rank, int partners[], int partners_map[],
   int *n_node_partners, int *n_inter_partners)
  { 
   int j;
   for (j=0; j shmcomm rank mapping;
 output: partners_map is array of ranks in shmcomm  */
  void translate_ranks(MPI_Comm shmcomm, int partners[], int partners_map[])
  {
MPI_Group world_group, shared_group;

/* create MPI groups for global communicator and shm communicator */
MPI_Comm_group (MPI_COMM_WORLD, &world_group); 
MPI_Comm_group (shmcomm, &shared_group);

MPI_Group_translate_ranks (world_group, n_partners, partners, shared_group, 
partners_map); 
  }



  /* returns pointers to mem windows partners_ptrs */
  void get_partners_ptrs(MPI_Win win, int partners[], int partners_map[], int 
**partners_ptrs )
  {
   int j, dsp_unit;
   MPI_Aint sz;

for (j=0; j shmcomm rank is in partners_map */
partners_map = (int*)malloc(n_partners*sizeof(int)); /* allocate 
partners_map */
translate_ranks(shmcomm, partners, partners_map);

   /* number of inter and intra node partners */
get_n_partners (rank, partners, partners_map,  &n_node_partners, 
&n_inter_partners); 

 printf( "parts %d, %d,  %d\n", rank,n_node_partners,n_inter_partners);

alloc_len = sizeof(int); 
if (n_node_partners > 0)
{
 /* allocate shared memory windows on each node for intra-node partners  */
 MPI_Win_allocate_shared (alloc_len, 1, MPI_INFO_NULL, shmcomm, /* inputs 
to MPI-3 SHM collective */
  &mem, &win);  /* outputs: mem - initial address 
of window; win - window object */

 /* pointers to mem windows */
 partners_ptrs = (int **)malloc(n_partners*sizeof(int*));  
 get_p

[OMPI users] Head Node as a Compute Node

2016-04-13 Thread Matthew Larkin
Has anyone in the users list conducted any kind of analysis about using a head 
node as a compute node in a distributed system?
Does it effect resources, or simply chance performance in any way?
Thanks!

Re: [OMPI users] Head Node as a Compute Node

2016-04-13 Thread Gilles Gouaillardet

Matthew,

this is generally a bad idea :
many users are logged into the login node, some of them run a graphic 
desktop in VNC, they sometime
start (heavy) compilation and once in a while, they even run a 
(hopefully) small MPI application so they do not have to use the batch 
manager (e.g. write a script and wait for the resources)
head node can also run several monitoring daemons (not to mention a NFS 
server) and this node can be quite "noisy" (OS jitter). once in a while, 
some end users will just ask too much memory and the oom killer will be 
invoked (and it sometimes kill process you do hope it spared)


bottom line, it is virtually impossible to predict how much RAM and CPU 
will be available when a job runs.


most MPI applications are synchronous, which means the performance is 
driven by the slowest node.
The head node might or might not be the slowest node, this is quite 
unpredictable and end users will likely end up complaining about random 
performances of their application.


if your budget allows it, i strongly recommend you do not run jobs on a 
head node.


that being said, you might consider running an "overflow" queue but only 
for single node jobs on the head node


Cheers,

Gilles

On 4/14/2016 4:57 AM, Matthew Larkin wrote:
Has anyone in the users list conducted any kind of analysis about 
using a head node as a compute node in a distributed system?


Does it effect resources, or simply chance performance in any way?

Thanks!


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/04/28934.php




Re: [OMPI users] system call failed that shouldn't?

2016-04-13 Thread Gilles Gouaillardet

Tom,

i was able to reproduce the issue with an older v1.10 version, but not 
with current v1.10 from git.


could you please give a try to 1.10.3rc1 available at 
https://www.open-mpi.org/software/ompi/v1.10/downloads/openmpi-1.10.3rc1.tar.bz2 
?


Cheers,

Gilles

On 4/14/2016 4:05 AM, Tom Rosmond wrote:

Hello,

In this thread from the Open-MPI archives:

https://www.open-mpi.org/community/lists/devel/2014/03/14416.php

a strange problem with a system call is discussed, and claimed to be 
solved.  However, in running a simple test program with some new MPI-3 
functions, the problem seems to be back:  Here is an example message:


- 



A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  cedar.reachone.com
  System call: unlink(2) 
/tmp/openmpi-sessions-rosmond@cedar_0/18624/1/shared_window_4.cedar

  Error:   No such file or directory (errno 2)

 



The same problem occurs on 2 different systems:

1. (Open MPI) 1.10.2 : gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) 
(GCC)


2. (Open MPI) 1.8.4 : gcc version 4.7.2 (Debian 4.7.2-5)

Attached are
1.  ompi_info from the first system above
2. The source code of the test program (based on code downloaded from 
an Intel source)


I simply do

mpicc shared_mpi.c
mpirun -np 8 a.out > outp

The program runs correctly on both system using Intel MPI, but returns 
the messages above with Open MPI


T. Rosmond




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/04/28933.php