Re: [OMPI users] What causes the pingpong bandwidth "hump" on SM?

2016-03-10 Thread Vincent Diepeveen


You're trying to read absurd huge message sizes considering you're busy 
testing the memory bandwidth of your system in this manner.


As soon as the message gets larger than your CPU's caching 
system it has to copy the message several times via your RAM, falls 
outside CPU's L2 or L3 cache and bandwidth drops.


This has nothing to do with OpenMPI i'd say.




On Thu, 10 Mar 2016, BRADLEY, PETER C PW wrote:



I’m curious what causes the hump in the pingpong bandwidth curve when running 
on shared memory.  Here’s an example running on a fairly antiquated
single-socket 4 core laptop with linux (2.6.32 kernel).  Is this a cache 
effect?  Something in OpenMPI itself, or a combination?

 

 

Macintosh HD:Users:up:Pictures:bandwidth_onepair_onenode.png

 

Pete

 




Re: [OMPI users] OMPI users] What causes the pingpong bandwidth "hump" on SM?

2016-03-10 Thread Gilles Gouaillardet
Fwiw,you might want to try compare sm and vader
mpirun --mca btl self,sm ...
And with and without knem
(modprobe knem should do the trick)

 Cheers,

Gilles

Vincent Diepeveen  wrote:
>
>You're trying to read absurd huge message sizes considering you're busy 
>testing the memory bandwidth of your system in this manner.
>
>As soon as the message gets larger than your CPU's caching 
>system it has to copy the message several times via your RAM, falls 
>outside CPU's L2 or L3 cache and bandwidth drops.
>
>This has nothing to do with OpenMPI i'd say.
>
>
>
>
>On Thu, 10 Mar 2016, BRADLEY, PETER C PW wrote:
>
>> 
>> I’m curious what causes the hump in the pingpong bandwidth curve when 
>> running on shared memory.  Here’s an example running on a fairly antiquated
>> single-socket 4 core laptop with linux (2.6.32 kernel).  Is this a cache 
>> effect?  Something in OpenMPI itself, or a combination?
>> 
>>  
>> 
>>  
>> 
>> Macintosh HD:Users:up:Pictures:bandwidth_onepair_onenode.png
>> 
>>  
>> 
>> Pete
>> 
>>  
>> 
>> 
>>

Re: [OMPI users] What causes the pingpong bandwidth "hump" on SM?

2016-03-10 Thread Gilles Gouaillardet
Pete,

how did you measure the bandwidth ?
iirc, IMB benchmark does not reuse send and recv buffers, so the results
could be different.
also, you might want to use a logarithmic scale for the message size, so
information for small messages is easier to read.

Cheers,

Gilles

On Thursday, March 10, 2016, BRADLEY, PETER C PW 
wrote:

> I’m curious what causes the hump in the pingpong bandwidth curve when
> running on shared memory.  Here’s an example running on a fairly antiquated
> single-socket 4 core laptop with linux (2.6.32 kernel).  Is this a cache
> effect?  Something in OpenMPI itself, or a combination?
>
>
>
>
>
> [image: Macintosh HD:Users:up:Pictures:bandwidth_onepair_onenode.png]
>
>
>
> Pete
>
>
>


[OMPI users] Failed Flash run on Pleiades with OpenMPI 1.10.2

2016-03-10 Thread Joshua Wall
Dear users,

 Hello, I'm relatively new to building OpenMPI from scratch, so I'm
going to try and provide a lot of information about exactly what I did
here. I'm attempting to run the MHD code Flash 4.2.2 on Pleiades (NASA
AMES), and also need some python mpi4py functionality and Cuda which ruled
out using the pre-installed MPI implementations. My code has been tested
and working under a previous build of OpenMPI 1.10.2 on a local cluster at
Drexel University that does not have a job manager and that uses a simple
Infiniband setup.. Pleiades is a bit more complicated, but I've been
following the NASA folks setup commands and they claim looking at my job
logs from their side that nothing seems wrong communications wise.

However, when I run just a vanilla version of Flash 4.2.2 it runs for
several steps and then crashes. Here's the last part of the Flash run
output:

 *** Wrote particle file to BB_hdf5_part_0008 
  17 1.5956E+11 5.4476E+09  (-5.031E+16,  1.969E+16, -2.188E+15) |
 5.448E+09
 *** Wrote plotfile to BB_hdf5_plt_cnt_0009 
 WARNING: globalNumParticles = 0!!!
  iteration, no. not moved =0  69
  iteration, no. not moved =1  29
  iteration, no. not moved =2   0
 refined: total leaf blocks =  120
 refined: total blocks =  137
  18 1.7046E+11 5.3814E+09  (-2.516E+16,  2.734E+16, -1.094E+15) |
 5.381E+09
 WARNING: globalNumParticles = 0!!!
 *** Wrote particle file to BB_hdf5_part_0009 
  19 1.8122E+11 2.9425E+09  (-2.078E+16, -2.516E+16, -3.391E+16) |
 2.943E+09
 *** Wrote plotfile to BB_hdf5_plt_cnt_0010 
 WARNING: globalNumParticles = 0!!!
  iteration, no. not moved =0 128
  iteration, no. not moved =1  25
  iteration, no. not moved =2   0
 refined: total leaf blocks =  456
 refined: total blocks =  521
 Paramesh error : pe   65  needed full blk1  57
 but could not find it or only  found part of it in the message buffer.
Contact PARAMESH developers for help.
--
MPI_ABORT was invoked on rank 65 in communicator MPI COMMUNICATOR 3 SPLIT
FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
 Paramesh error : pe   80  needed full blk1  72
 but could not find it or only  found part of it in the message buffer.
Contact PARAMESH developers for help.

You can see the entire output at:

https://drive.google.com/file/d/0B7Zx9zNTB3icQWZPTUlhZFQtcWs/view?usp=sharing


Okay, so I built it with (as instructed by NASA HECC):

./configure --with-tm=/PBS
--with-verbs=/usr --enable-mca-no-build=maffinity-libnuma
--with-cuda=/nasa/cuda/7.0 --enable-mpi-interface-warning --without-slurm
--without-loadleveler --enable-mpirun-prefix-by-default
--enable-btl-openib-failover --prefix=/u/jewall/ompi-1.10.2


And if I run the ompi_info on 96 cores (the same # I did the job on) I get
the following output:

https://drive.google.com/file/d/0B7Zx9zNTB3icSHNZaEpZZkhPcXc/view?usp=sharing

And the job was run with the following script:

#PBS -S /bin/bash
#PBS -N cfd

#PBS -q debug
#PBS -l select=8:ncpus=12:model=has
#PBS -l walltime=0:30:00
#PBS -j oe
#PBS -W group_list=g23107
#PBS -m e

# Load a compiler you use to build your executable, for example,
comp-intel/2015.0.090.

#source /usr/local/lib/global.profile

module load git/2.4.5
module load szip/2.1/gcc
module load cuda/7.0
module load gcc/4.9.3
module load cmake/2.8.12.1
module load python/2.7.10

# Add your commands here to extend your PATH, etc.

export MPIHOME=/u/jewall/ompi-1.10.2
export MPICC=${MPIHOME}/bin/mpicc
export MPIFC=${MPIHOME}/bin/mpif90
export MPICXX=${MPIHOME}/bin/mpic++
export MPIEXEC=${MPIHOME}/bin/mpiexec
export HDF5=/u/jewall/hdf5

setenv OMPI_MCA_btl_openib_if_include mlx4_0:1


PATH=$PATH:${PYTHONPATH}:$HOME/bin# Add private commands to PATH

# By default, PBS executes your job from your home directory.
# However, you can use the environment variable
# PBS_O_WORKDIR to change to the directory where
# you submitted your job.

cd $PBS_O_WORKDIR

echo ${PBS_NODEFILE}
cat ${PBS_NODEFILE} | awk '{print $1}' > "local_host.txt"
cat local_host.txt

# use of dplace to pin processes to processors may improve performance
# Here you request to pin processes to processors 4-11 of each Sandy Bridge
node.
# For other processor types, you may have to pin to different processors.

# The resource request of select=32 and mpiprocs=8 implies
# that you want to have 256 MPI processes in total.
# If this is correct, you can omit the -np 256 for mpiexec
# that you might have used before.

${MPIEXEC} --mca mpi_warn_on_fork 0 --mca mpi_cuda_support 0 -

Re: [OMPI users] Poor performance on Amazon EC2 with TCP

2016-03-10 Thread Jackson, Gary L.
I re-ran all experiments with 1.10.2 configured the way you specified. My 
results are here:

https://www.dropbox.com/s/4v4jaxe8sflgymj/collected.pdf?dl=0

Some remarks:

  1.  OpenMPI had poor performance relative to raw TCP and IMPI across all MTUs.
  2.  Those issues appeared at larger message sizes.
  3.  Intel MPI and raw TCP were comparable across message sizes and MTUs.

With respect to some other concerns:

  1.  I verified that the MTU values I'm using are correct with tracepath.
  2.  I am using a placement group.

--
Gary Jackson

From: users mailto:users-boun...@open-mpi.org>> on 
behalf of Gilles Gouaillardet mailto:gil...@rist.or.jp>>
Reply-To: Open MPI Users mailto:us...@open-mpi.org>>
List-Post: users@lists.open-mpi.org
Date: Tuesday, March 8, 2016 at 11:07 PM
To: Open MPI Users mailto:us...@open-mpi.org>>
Subject: Re: [OMPI users] Poor performance on Amazon EC2 with TCP

Jackson,

one more thing, how did you build openmpi ?

if you built from git (and without VPATH), then --enable-debug is automatically 
set, and this is hurting performance.
if not already done, i recommend you download the latest openmpi tarball 
(1.10.2) and
./configure --with-platform=contrib/platform/optimized --prefix=...
last but not least, you can
mpirun --mca mpi_leave_pinned 1 
(that being said, i am not sure this is useful with TCP networks ...)

Cheers,

Gilles



On 3/9/2016 11:34 AM, Rayson Ho wrote:
If you are using instance types that support SR-IOV (aka. "enhanced networking" 
in AWS), then turn it on. We saw huge differences when SR-IOV is enabled

http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html

Make sure you start your instances with a placement group -- otherwise, the 
instances can be data centers apart!

And check that jumbo frames are enabled properly:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html

But still, it is interesting that Intel MPI is getting a 2X speedup with the 
same setup! Can you post the raw numbers so that we can take a deeper look??

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html




On Tue, Mar 8, 2016 at 9:08 AM, Jackson, Gary L. 
<gary.jack...@jhuapl.edu>
 wrote:

I've built OpenMPI 1.10.1 on Amazon EC2. Using NetPIPE, I'm seeing about half 
the performance for MPI over TCP as I do with raw TCP. Before I start digging 
in to this more deeply, does anyone know what might cause that?

For what it's worth, I see the same issues with MPICH, but I do not see it with 
Intel MPI.

--
Gary Jackson


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28659.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28665.php



[OMPI users] locked memory and queue pairs

2016-03-10 Thread Michael Di Domenico
when i try to run an openmpi job with >128 ranks (16 ranks per node)
using alltoall or alltoallv, i'm getting an error that the process was
unable to get a queue pair.

i've checked the max locked memory settings across my machines;

using ulimit -l in and outside of mpirun and they're all set to unlimited
pam modules to ensure pam_limits.so is loaded and working
the /etc/security/limits.conf is set for soft/hard mem to unlimited

i tried a couple of quick mpi config settings i could think of;

-mca mtl ^psm no affect
-mca btl_openib_flags 1 no affect

the openmpi faq says to tweak some mtt values in /sys, but since i'm
not on mellanox that doesn't apply to me

the machines are rhel 6.7, kernel 2.6.32-573.12.1(with bundled ofed),
running on qlogic single-port infiniband cards, psm is enabled

other collectives seem to run okay, it seems to only be alltoall comms
that fail and only at scale

i believe (but can't prove) that this worked at one point, but i can't
recall when i last tested it.  so it's reasonable to assume that some
change to the system is preventing this.

the question is, where should i start poking to find it?


Re: [OMPI users] What causes the pingpong bandwidth "hump" on SM?

2016-03-10 Thread BRADLEY, PETER C PW
This is an academic exercise, obviously.  The curve shown comes from one pair 
of ranks running on the same node alternating between MPI_Send and MPI_Recv.  
The most likely suspect is a cache effect, but rather than assuming, I was 
curious if there might be any other aspects of the implementation at work.

Pete



Pete,

how did you measure the bandwidth ?
iirc, IMB benchmark does not reuse send and recv buffers, so the results
could be different.
also, you might want to use a logarithmic scale for the message size, so
information for small messages is easier to read.

Cheers,

Gilles

On Thursday, March 10, 2016, BRADLEY, PETER C PW 
wrote:

> I’m curious what causes the hump in the pingpong bandwidth curve when
> running on shared memory. Here’s an example running on a fairly antiquated
> single-socket 4 core laptop with linux (2.6.32 kernel). Is this a cache
> effect? Something in OpenMPI itself, or a combination?
>
>
>
>
>
> [image: Macintosh HD:Users:up:Pictures:bandwidth_onepair_onenode.png]
>
>
>
> Pete
>
>
>



Re: [OMPI users] What causes the pingpong bandwidth "hump" on SM?

2016-03-10 Thread Jeff Squyres (jsquyres)
I think the information was scattered across a few posts, but the union of 
which is correct:

- it depends on the benchmark

- yes, L1/L2/L3 cache sizes can have a huge effect.  I.e., once the buffer size 
gets bigger than the cache size, it takes more time to get the message from 
main RAM
  --> check the output from hwloc's "lstopo" tool to find your cache sizes

- the specific flavor of shared memory used also has a huge effect.  The 
default is copy-in/copy-out, but other shared memory mechanisms are also 
available (e.g., Linux CMA, Linux KNEM, XPMEM)

Does that help?


> On Mar 10, 2016, at 12:25 PM, BRADLEY, PETER C PW 
>  wrote:
> 
> This is an academic exercise, obviously.  The curve shown comes from one pair 
> of ranks running on the same node alternating between MPI_Send and MPI_Recv.  
> The most likely suspect is a cache effect, but rather than assuming, I was 
> curious if there might be any other aspects of the implementation at work.
> 
> Pete
> 
>  
> 
> Pete, 
> 
> how did you measure the bandwidth ? 
> iirc, IMB benchmark does not reuse send and recv buffers, so the results 
> could be different. 
> also, you might want to use a logarithmic scale for the message size, so 
> information for small messages is easier to read. 
> 
> Cheers, 
> 
> Gilles 
> 
> On Thursday, March 10, 2016, BRADLEY, PETER C PW 
>  
> wrote: 
> 
> > I’m curious what causes the hump in the pingpong bandwidth curve when 
> > running on shared memory. Here’s an example running on a fairly 
> > antiquated 
> > single-socket 4 core laptop with linux (2.6.32 kernel). Is this a cache 
> > effect? Something in OpenMPI itself, or a combination? 
> > 
> > 
> > 
> > 
> > 
> > [image: Macintosh HD:Users:up:Pictures:bandwidth_onepair_onenode.png] 
> > 
> > 
> > 
> > Pete 
> > 
> > 
> > 
> 
>  
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28674.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] What causes the pingpong bandwidth "hump" on SM?

2016-03-10 Thread Vincent Diepeveen




On Thu, 10 Mar 2016, BRADLEY, PETER C PW wrote:



This is an academic exercise, obviously.  The curve shown comes from one pair 
of ranks running on the same node alternating between MPI_Send and
MPI_Recv.  The most likely suspect is a cache effect, but rather than assuming, 
I was curious if there might be any other aspects of the implementation
at work.

Pete


Well with some more effort you can get all statistics from the processor 
on cache misses...


Your graph doesn't show other effects though i suspect.

A multiple of 5GB/s is a lot of bandwidth for a laptop right now. 
Please consider that each buffer gets copied as you ship a message so the 
actual bandwidth is a multiple of that 5GB/s.


This test is all shipping it in a FIFO order (first in first out) - trashing 
the caches in short.

Ping-pong isn't intended as a bandwidth test at all.

It's a latency test. More useful to run on a supercomputer and figure out 
the time it takes to get to a remote node and back, and then it divides 
that by 2.


I wrote a few tests some years ago to test the random latency of your RAM 
with all cores at the same time.


Yet very few of those tests are busy with bandwidth. They care about 
number of of messages a second one can push through. So with latency.


The network plays a larger role when run over multiple nodes, whereas 
here you just try to figure out how good your L2/L3 cache is on the CPU.


Let me assure you - that L2/L3 cache works very well :)


 

Pete,

how did you measure the bandwidth ?
iirc, IMB benchmark does not reuse send and recv buffers, so the results
could be different.
also, you might want to use a logarithmic scale for the message size, so
information for small messages is easier to read.

Cheers,

Gilles

On Thursday, March 10, 2016, BRADLEY, PETER C PW 
wrote:

> I’m curious what causes the hump in the pingpong bandwidth curve when
> running on shared memory. Here’s an example running on a fairly antiquated
> single-socket 4 core laptop with linux (2.6.32 kernel). Is this a cache
> effect? Something in OpenMPI itself, or a combination?
>
>
>
>
>
> [image: Macintosh HD:Users:up:Pictures:bandwidth_onepair_onenode.png]
>
>
>
> Pete
>
>
>

 




[OMPI users] Error with MPI_Register_datarep

2016-03-10 Thread Éric Chamberland

Hi,

I have a segfault while trying to use MPI_Register_datarep with 
openmpi-1.10.2:


mpic++ -g -o int64 int64.cc
./int64
[melkor:24426] *** Process received signal ***
[melkor:24426] Signal: Segmentation fault (11)
[melkor:24426] Signal code: Address not mapped (1)
[melkor:24426] Failing at address: (nil)
[melkor:24426] [ 0] /lib64/libpthread.so.0(+0xf1f0)[0x7f66cfb731f0]
[melkor:24426] *** End of error message ***
Segmentation fault (core dumped)

I have attached the beginning of a test program that use this function.

(and btw a totally different error occur with mpich: 
http://lists.mpich.org/pipermail/discuss/2016-March/004586.html)


Can someone help me?

Thanks,

Eric

#include 
#include "mpi.h"
#include 
#include 
#include 

void abortOnError(int ierr) {
  if (ierr != MPI_SUCCESS) {
printf("ERROR Returned by MPI: %d\n",ierr);
char* lCharPtr = new char[MPI_MAX_ERROR_STRING];
int lLongueur = 0;
MPI_Error_string(ierr,lCharPtr, &lLongueur);
printf("ERROR_string Returned by MPI: %s\n",lCharPtr);
MPI_Abort( MPI_COMM_WORLD, 1 );
  }
}

namespace PAIO {

int calculeExtentInt64VsInt32(MPI_Datatype pDataType,
MPI_Aint *pFileExtent,
void *pExtraState)
{
int lReponse = MPI_ERR_TYPE;
if (pDataType == MPI_LONG_INT) {
lReponse = 8;
}
return lReponse;
}

int conversionLectureInt64VersInt32(void* pUserBuf,
MPI_Datatype pDataType,
int pCount,
void *pFileBuf,
MPI_Offset pPosition,
void* pExtraState)
{
if (pDataType != MPI_LONG_INT) {
return MPI_ERR_TYPE;
}

//ici byte_swap pas nécessaire...

// Conversion du int64 en int32:
for (int i = 0; i < pCount; ++i) {
((int*) pUserBuf)[pPosition+i] = ((long long int*) pFileBuf)[i];
}
return MPI_SUCCESS;
}

int conversionEcritureInt32VersInt64(void* pUserBuf,
MPI_Datatype pDataType,
int pCount,
void *pFileBuf,
MPI_Offset pPosition,
void* pExtraState)
{
if (pDataType != MPI_LONG_INT) {
return MPI_ERR_TYPE;
}

// Conversion du int32 en int64:
for (int i = 0; i < pCount; ++i) {
((long long int*) pFileBuf)[i] = ((int*) pUserBuf)[pPosition+i];
}

//ici byte_swap pas nécessaire...

return MPI_SUCCESS;
}

}

int main (int argc, char *argv[])
{

  MPI_Init( &argc, &argv );

  int nb_proc = 0;
  MPI_Comm_size( MPI_COMM_WORLD, &nb_proc );

  //Appelle l'enregistrement d'un nouveau "datarep":

  abortOnError(MPI_Register_datarep("int64",
  PAIO::conversionLectureInt64VersInt32,
  PAIO::conversionEcritureInt32VersInt64,
  PAIO::calculeExtentInt64VsInt32,
  NULL));

  MPI_Finalize();

  return 0;
}


Re: [OMPI users] What causes the pingpong bandwidth "hump" on SM?

2016-03-10 Thread BRADLEY, PETER C PW
Jeff et al,
Thanks, exactly what I was looking for.
Pete
I think the information was scattered across a few posts, but the union of 
which is correct:
- it depends on the benchmark
- yes, L1/L2/L3 cache sizes can have a huge effect. I.e., once the buffer size 
gets bigger than the cache size, it takes more time to get the message from 
main RAM
  --> check the output from hwloc's "lstopo" tool to find your cache sizes
- the specific flavor of shared memory used also has a huge effect. The default 
is copy-in/copy-out, but other shared memory mechanisms are also available 
(e.g., Linux CMA, Linux KNEM, XPMEM)
Does that help?
> On Mar 10, 2016, at 12:25 PM, BRADLEY, PETER C PW 
>  wrote:
>
> This is an academic exercise, obviously. The curve shown comes from one pair 
> of ranks running on the same node alternating between MPI_Send and MPI_Recv. 
> The most likely suspect is a cache effect, but rather than assuming, I was 
> curious if there might be any other aspects of the implementation at work.
>
> Pete
>
>
>
> Pete,
>
> how did you measure the bandwidth ?
> iirc, IMB benchmark does not reuse send and recv buffers, so the results
> could be different.
> also, you might want to use a logarithmic scale for the message size, so
> information for small messages is easier to read.
>
> Cheers,
>
> Gilles
>
> On Thursday, March 10, 2016, BRADLEY, PETER C PW 
> wrote:
>
> > I’m curious what causes the hump in the pingpong bandwidth curve when
> > running on shared memory. Here’s an example running on a fairly antiquated
> > single-socket 4 core laptop with linux (2.6.32 kernel). Is this a cache
> > effect? Something in OpenMPI itself, or a combination?
> >
> >
> >
> >
> >
> > [image: Macintosh HD:Users:up:Pictures:bandwidth_onepair_onenode.png]
> >
> >
> >
> > Pete
> >
> >
> >
>
>
> ___
> users mailing list
> users_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28674.php
--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Error with MPI_Register_datarep

2016-03-10 Thread Gilles Gouaillardet

Eric,

I will fix the crash (fwiw, it is already fixed in v2.x and master)

note this program cannot currently run "as is".
by default, there are two frameworks for io : ROMIO and OMPIO.
MPI_Register_datarep does try to register the datarep into all frameworks,
and successes only if datarep was successfully registered into all 
frameworks.


OMPIO does not currently support this
(and the stub is missing in v1.10 so the app does not crash)

your test is successful if you blacklist ompio :

mpirun --mca io ^ompio ./int64
or
OMPI_MCA_io=^romio ./int64

and you do not even need a patch for that :-)


Cheers,

Gilles

On 3/11/2016 4:47 AM, Éric Chamberland wrote:

Hi,

I have a segfault while trying to use MPI_Register_datarep with 
openmpi-1.10.2:


mpic++ -g -o int64 int64.cc
./int64
[melkor:24426] *** Process received signal ***
[melkor:24426] Signal: Segmentation fault (11)
[melkor:24426] Signal code: Address not mapped (1)
[melkor:24426] Failing at address: (nil)
[melkor:24426] [ 0] /lib64/libpthread.so.0(+0xf1f0)[0x7f66cfb731f0]
[melkor:24426] *** End of error message ***
Segmentation fault (core dumped)

I have attached the beginning of a test program that use this function.

(and btw a totally different error occur with mpich: 
http://lists.mpich.org/pipermail/discuss/2016-March/004586.html)


Can someone help me?

Thanks,

Eric



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28677.php




Re: [OMPI users] Error with MPI_Register_datarep

2016-03-10 Thread Éric Chamberland

Thanks Gilles!

it works... I will continue my tests with that command line...

Until OMPIO supports this, is there a way to put a call into the code to 
disable ompio the same way --mca io ^ompio does?


Thanks,

Eric

Le 16-03-10 20:13, Gilles Gouaillardet a écrit :

Eric,

I will fix the crash (fwiw, it is already fixed in v2.x and master)

note this program cannot currently run "as is".
by default, there are two frameworks for io : ROMIO and OMPIO.
MPI_Register_datarep does try to register the datarep into all frameworks,
and successes only if datarep was successfully registered into all 
frameworks.


OMPIO does not currently support this
(and the stub is missing in v1.10 so the app does not crash)

your test is successful if you blacklist ompio :

mpirun --mca io ^ompio ./int64
or
OMPI_MCA_io=^romio ./int64

and you do not even need a patch for that :-)


Cheers,

Gilles

On 3/11/2016 4:47 AM, Éric Chamberland wrote:

Hi,

I have a segfault while trying to use MPI_Register_datarep with 
openmpi-1.10.2:


mpic++ -g -o int64 int64.cc
./int64
[melkor:24426] *** Process received signal ***
[melkor:24426] Signal: Segmentation fault (11)
[melkor:24426] Signal code: Address not mapped (1)
[melkor:24426] Failing at address: (nil)
[melkor:24426] [ 0] /lib64/libpthread.so.0(+0xf1f0)[0x7f66cfb731f0]
[melkor:24426] *** End of error message ***
Segmentation fault (core dumped)

I have attached the beginning of a test program that use this function.

(and btw a totally different error occur with mpich: 
http://lists.mpich.org/pipermail/discuss/2016-March/004586.html)


Can someone help me?

Thanks,

Eric



___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/03/28677.php






Re: [OMPI users] Error with MPI_Register_datarep

2016-03-10 Thread Gilles Gouaillardet

Eric,

my short answer is no.

long answer is :

- from MPI_Register_datarep()

   /* The io framework is only initialized lazily.  If it hasn't
   already been initialized, do so now (note that MPI_FILE_OPEN
   and MPI_FILE_DELETE are the only two places that it will be
   initialized). */

- from mca_io_base_register_datarep()
/* Find the maximum additional number of bytes required by all io
   components for requests and make that the request size */

OPAL_LIST_FOREACH(cli, 
&ompi_io_base_framework.framework_components, 
mca_base_component_list_item_t) {

...
}

in your case, since nor MPI_File_open nor MPI_File_delete is invoked, 
the ompio component could be disabled.
but that would mean the io component selection is also based on the fact 
that MPI_Register_datarep() has
been invoked or not before. i can foresee users complaining about IO 
performance discrepancies just because

of one line (e.g. MPI_Register_datarep invokation) in their code.

now if MPI_File_open is invoked first, that means that 
MPI_Register_datarep will fail or success based on the selected io 
component (and iirc, that could be file(system) dependent within the 
same application).


i am open to suggestions, but so far, i do not see a better one (other 
than implementing this in OMPIO)
the patch for v1.10 can be downloaded at 
https://github.com/ggouaillardet/ompi-release/commit/1589278200d9fb363d61fa20fb39a4c2fa78c942.patch

application will not crash, but fail "nicely" on MPI_Register_datarep

Cheers,

Gilles

On 3/11/2016 12:11 PM, Éric Chamberland wrote:

Thanks Gilles!

it works... I will continue my tests with that command line...

Until OMPIO supports this, is there a way to put a call into the code 
to disable ompio the same way --mca io ^ompio does?


Thanks,

Eric

Le 16-03-10 20:13, Gilles Gouaillardet a écrit :

Eric,

I will fix the crash (fwiw, it is already fixed in v2.x and master)

note this program cannot currently run "as is".
by default, there are two frameworks for io : ROMIO and OMPIO.
MPI_Register_datarep does try to register the datarep into all 
frameworks,
and successes only if datarep was successfully registered into all 
frameworks.


OMPIO does not currently support this
(and the stub is missing in v1.10 so the app does not crash)

your test is successful if you blacklist ompio :

mpirun --mca io ^ompio ./int64
or
OMPI_MCA_io=^romio ./int64

and you do not even need a patch for that :-)


Cheers,

Gilles

On 3/11/2016 4:47 AM, Éric Chamberland wrote:

Hi,

I have a segfault while trying to use MPI_Register_datarep with 
openmpi-1.10.2:


mpic++ -g -o int64 int64.cc
./int64
[melkor:24426] *** Process received signal ***
[melkor:24426] Signal: Segmentation fault (11)
[melkor:24426] Signal code: Address not mapped (1)
[melkor:24426] Failing at address: (nil)
[melkor:24426] [ 0] /lib64/libpthread.so.0(+0xf1f0)[0x7f66cfb731f0]
[melkor:24426] *** End of error message ***
Segmentation fault (core dumped)

I have attached the beginning of a test program that use this function.

(and btw a totally different error occur with mpich: 
http://lists.mpich.org/pipermail/discuss/2016-March/004586.html)


Can someone help me?

Thanks,

Eric



___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/03/28677.php






___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28680.php