Re: [OMPI users] mpirun fails on the host

2009-06-19 Thread Honest Guvnor
On Fri, Jun 19, 2009 at 3:12 AM, Ralph Castain  wrote:

> Add --debug-devel to your cmd line and you'll get a bunch of diagnostic
> info. Did you configure --enable-debug? If so, then additional debug can be
> obtained - can let you know how to get it, if necessary.


Yes we had run with the -d flag and it was the output from this that
prompted us to find out how to prevent the use of the external network. I am
not sure what most of the messages mean but we still get quite a few
references to hankel.fred.com which the nodes will not be able to access.
Here is the output (changed external ip numbers and domain):

[cluster@hankel ~]$ mpirun --debug-devel --mca btl tcp,self --mca
btl_tcp_if_exclude lo,eth0 --mca oob_tcp_if_exclude lo,eth0 -np 1 --host n06
hostname
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] [0,0,0] setting up session dir with
[hankel.fred.com:26997] universe default-universe-26997
[hankel.fred.com:26997] user cluster
[hankel.fred.com:26997] host hankel.fred.com
[hankel.fred.com:26997] jobid 0
[hankel.fred.com:26997] procid 0
[hankel.fred.com:26997] procdir:
/tmp/openmpi-sessions-clus...@hankel.fred.com_0/default-universe-26997/0/0
[hankel.fred.com:26997] jobdir:
/tmp/openmpi-sessions-clus...@hankel.fred.com_0/default-universe-26997/0
[hankel.fred.com:26997] unidir:
/tmp/openmpi-sessions-clus...@hankel.fred.com_0/default-universe-26997
[hankel.fred.com:26997] top: openmpi-sessions-clus...@hankel.fred.com_0
[hankel.fred.com:26997] tmp: /tmp
[hankel.fred.com:26997] [0,0,0] contact_file
/tmp/openmpi-sessions-clus...@hankel.fred.com_0
/default-universe-26997/universe-setup.txt
[hankel.fred.com:26997] [0,0,0] wrote setup file
[hankel.fred.com:26997] pls:rsh: local csh: 0, local sh: 1
[hankel.fred.com:26997] pls:rsh: assuming same remote shell as local shell
[hankel.fred.com:26997] pls:rsh: remote csh: 0, remote sh: 1
[hankel.fred.com:26997] pls:rsh: final template argv:
[hankel.fred.com:26997] pls:rsh: /usr/bin/ssh  orted --debug
--bootproxy 1 --name  --num_procs 2 --vpid_start 0 --nodename
 --universe clus...@hankel.fred.com:default-universe-26997
--nsreplica "0.0.0;tcp://192.168.0.99:54116" --gprreplica "0.0.0;tcp://
192.168.0.99:54116"
[hankel.fred.com:26997] pls:rsh: launching on node n06
[hankel.fred.com:26997] pls:rsh: n06 is a REMOTE node
[hankel.fred.com:26997] pls:rsh: executing: (//usr/bin/ssh) /usr/bin/ssh n06
 PATH=/usr/lib/openmpi/1.2.7-gcc/bin:$PATH ; export PATH ;
LD_LIBRARY_PATH=/usr/lib/openmpi/1.2.7-gcc/lib:$LD_LIBRARY_PATH ; export
LD_LIBRARY_PATH ; /usr/lib/openmpi/1.2.7-gcc/bin/orted --debug --bootproxy 1
--name 0.0.1 --num_procs 2 --vpid_start 0 --nodename n06 --universe
clus...@hankel.fred.com:default-universe-26997 --nsreplica "0.0.0;tcp://
192.168.0.99:54116" --gprreplica "0.0.0;tcp://192.168.0.99:54116" [HOSTNAME=
hankel.fred.com TERM=xterm-color SHELL=/bin/bash HISTSIZE=1000
SSH_CLIENT=130.149.86.77 50506 22 SSH_TTY=/dev/pts/12 USER=cluster
LD_LIBRARY_PATH=:/usr/lib/openmpi/1.2.7-gcc/lib
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:
MAIL=/var/spool/mail/cluster
PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/lib/openmpi/1.2.7-gcc/bin:/home/cluster/bin
INPUTRC=/etc/inputrc PWD=/home/cluster LANG=en_US.UTF-8
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass SHLVL=1
HOME=/home/cluster LOGNAME=cluster CVS_RSH=ssh
SSH_CONNECTION=222.222.222.222 50506

Re: [OMPI users] vfs_write returned -14

2009-06-19 Thread Josh Hursey


On Jun 18, 2009, at 7:33 PM, Kritiraj Sajadah wrote:



Hello Josh,
  ThanK you again for your respond. I tried chekpointing a  
simple c program using BLCR...and got the same error, i.e:


- vfs_write returned -14
- file_header: write returned -14
Checkpoint failed: Bad address


So I would look at how your NFS file system is setup, and work with  
your sysadmin (and maybe the BLCR list) to resolve this before  
experimenting too much with checkpointing with Open MPI.




This is how i installed and run mpi programs for checkpointing:

1) configure and install blcr
2) configure and install openmpi
3)  Compile and run mpi program as follows:
4) To checkpoint the running program,
5) To restart your checkpoint, locate the checkpoint file and type  
the following from the command line:




This all looks ok to me.


The did another test with BLCR however,

I tried checkpointing my c application from the /tmp directory  
instead of my $HOME directory and it checkpointed fine.


So, it looks like the problem is with my $HOME directory.

I have "drwx" rights on my $HOME directory which seems fine for me.

Then i tried it with open MPI.  However, with open mpi the  
checkpoint file automatically get saved in the $HOME directory.


Is there a way to have the file saved in a different location? I  
checked that LAM/MPI has some command line  options :


$ mpirun -np 2 -ssi cr_base_dir /somewhere/else a.out

Do we have a similar option for open mpi?


By default Open MPI places the global snapshot in the $HOME directory.  
But you can also specify a different directory for the global snapshot  
using the following MCA option:

  -mca snapc_base_global_snapshot_dir /somewhere/else

For the best results you will likely want to set this in the MCA  
params file in your home directory:

 shell$ cat ~/.openmpi/mca-params.conf
 snapc_base_global_snapshot_dir=/somewhere/else

You can also stage the file to local disk, then have Open MPI transfer  
the checkpoints back to a {logically} central storage device (both can  
be /tmp on a local disk if you like). For more details on this and the  
above option you will want to read through the FT Users Guide attached  
to the wiki page at the link below:

  https://svn.open-mpi.org/trac/ompi/wiki/ProcessFT_CR

-- Josh




Thanks a lot

regards,

Raj

--- On Wed, 6/17/09, Josh Hursey  wrote:


From: Josh Hursey 
Subject: Re: [OMPI users] vfs_write returned -14
To: "Open MPI Users" 
Date: Wednesday, June 17, 2009, 1:42 AM
Did you try checkpointing a non-MPI
application with BLCR on the
cluster? If that does not work then I would suspect that
BLCR is not
working properly on the system.

However if a non-MPI application can be checkpointed and
restarted
correctly on this machine then it may be something odd with
the Open
MPI installation or runtime environment. To help debug here
I would
need to know how Open MPI was configured and how the
application was
ran on the machine (command line arguments, environment
variables, ...).

I should note that for the program that you sent it is
important that
you compile Open MPI with the Fault Tolerance Thread
enabled to ensure
a timely checkpoint. Otherwise the checkpoint will be
delayed until
the MPI program enters the MPI_Finalize function.

Let me know what you find out.

Josh

On Jun 16, 2009, at 5:08 PM, Kritiraj Sajadah wrote:



Hi Josh,

Thanks for the email. I have install BLCR 0.8.1 and

openmpi 1.3 on

my laptop with Ubuntu 8.04 on it. It works fine.

I now tried the installation on the cluster ( on one

machine for

now) in my university. ( the administrator installed

it) i am not

sure if he followed the steps i gave him.

I am checkpointing a simple mpi application which

looks as follows:


#include 
#include 

int main(int argc, char **argv)
{
int rank,size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("I am processor no %d of a total of %d procs

\n", rank, size);

system("sleep 30");
printf("I am processor no %d of a total of %d procs

\n", rank, size);

system("sleep 30");
printf("I am processor no %d of a total of %d procs

\n", rank, size);

system("sleep 30");
printf("bye \n");
MPI_Finalize();
return 0;
}

Do you think its better to re install BLCR?


Thanks

Raj
--- On Tue, 6/16/09, Josh Hursey 

wrote:



From: Josh Hursey 
Subject: Re: [OMPI users] vfs_write returned -14
To: "Open MPI Users" 
Date: Tuesday, June 16, 2009, 6:42 PM

These are errors from BLCR. It may be a problem

with your

BLCR installation and/or your application. Are you

able to

checkpoint/restart a non-MPI application with BLCR

on these

machines?

What kind of MPI application are you trying to

checkpoint?

Some of the MPI interfaces are not fully supported

at the

moment (outlined in the FT User Document that I

mentioned in

a previous email).

-- Josh

On Jun 16, 2009, at 11:30 AM, Kritiraj Sajadah

wrote:




Dear All,
   I

have install

openmpi 1.3 and blcr 0.

[OMPI users] Bug in 1.3.2?: sm btl and isend is serializes

2009-06-19 Thread Mark Bolstad
I have a small test code that I've managed to duplicate the results from a
larger code. In essence, using the sm btl with ISend, I wind up with the
communication being completely serialized, i.e., all the calls from process
1 complete, then all from 2, ...

This is version 1.3.2, vanilla compile. I get the same results on my RHEL5
nehalem and an OS X laptop.
Here's an example of the output (note: there is a usleep in the code to
mimick my computation loop, and ensure that this is not a simple I/O
sequencing issue):

 Ignore the "next" in the output below, it was a broadcast test.

mpirun -np 5 ./mpi_split_test
Master [id = 0 of 5] is running on bolstadm-lm1
[0] next = 10
Server [id = 3, 2, 1 of 5] is running on bolstadm-lm1
Compositor [id = 1, 0 of 5] is running on bolstadm-lm1
[1] next = 10
Sending buffer 0 from 1
Server [id = 2, 1, 0 of 5] is running on bolstadm-lm1
[2] next = 10
Sending buffer 0 from 2
[3] next = 10
Server [id = 4, 3, 2 of 5] is running on bolstadm-lm1
[4] next = 10
Sending buffer 0 from 3
Sending buffer 1 from 1
Sending buffer 1 from 2
Sending buffer 1 from 3
Sending buffer 2 from 1
Sending buffer 2 from 2
Sending buffer 2 from 3
Sending buffer 3 from 1
Sending buffer 3 from 2
Sending buffer 4 from 1
Receiving buffer from 1, buffer = hello from 1 for the 0 time
Receiving buffer from 1, buffer = hello from 1 for the 1 time
Sending buffer 4 from 2
Sending buffer 4 from 3
Sending buffer 5 from 1
Receiving buffer from 1, buffer = hello from 1 for the 2 time
Sending buffer 6 from 1
Receiving buffer from 1, buffer = hello from 1 for the 3 time

-At this point, processes 2 & 3 are stuck in an MPI_Wait
...
Sending buffer 9 from 1
Receiving buffer from 1, buffer = hello from 1 for the 6 time
Receiving buffer from 1, buffer = hello from 1 for the 7 time
Receiving buffer from 1, buffer = hello from 1 for the 8 time
Receiving buffer from 1, buffer = hello from 1 for the 9 time
Receiving buffer from 2, buffer = hello from 2 for the 0 time
Receiving buffer from 2, buffer = hello from 2 for the 1 time
Receiving buffer from 2, buffer = hello from 2 for the 2 time
Sending buffer 5 from 2
Sending buffer 6 from 2
Receiving buffer from 2, buffer = hello from 2 for the 3 time

 Now process 2 is now running, 1 is in a barrier, 3 is still in Wait

Sending buffer 9 from 2
Receiving buffer from 2, buffer = hello from 2 for the 6 time
Receiving buffer from 2, buffer = hello from 2 for the 7 time
Receiving buffer from 2, buffer = hello from 2 for the 8 time
Receiving buffer from 2, buffer = hello from 2 for the 9 time
Receiving buffer from 3, buffer = hello from 3 for the 0 time
Sending buffer 5 from 3
Receiving buffer from 3, buffer = hello from 3 for the 1 time
Receiving buffer from 3, buffer = hello from 3 for the 2 time

 And now process 3 goes
...
Receiving buffer from 3, buffer = hello from 3 for the 8 time
Receiving buffer from 3, buffer = hello from 3 for the 9 time



Now running under TCP:

mpirun --mca btl tcp,self -np 5 ./mpi_split_test
Compositor [id = 1, 0 of 5] is running on bolstadm-lm1
Master [id = 0 of 5] is running on bolstadm-lm1
[0] next = 10
Server [id = 2, 1, 0 of 5] is running on bolstadm-lm1
Server [id = 3, 2, 1 of 5] is running on bolstadm-lm1
Server [id = 4, 3, 2 of 5] is running on bolstadm-lm1
[4] next = 10
Sending buffer 0 from 3
Sending buffer 0 from 1
[2] next = 10
[1] next = 10
Sending buffer 0 from 2
[3] next = 10
Receiving buffer from 1, buffer = hello from 1 for the 0 time
Receiving buffer from 3, buffer = hello from 3 for the 0 time
Receiving buffer from 2, buffer = hello from 2 for the 0 time
Sending buffer 1 from 3
Sending buffer 1 from 1
Sending buffer 1 from 2
Receiving buffer from 1, buffer = hello from 1 for the 1 time
Receiving buffer from 2, buffer = hello from 2 for the 1 time
Receiving buffer from 3, buffer = hello from 3 for the 1 time
Sending buffer 2 from 3
Sending buffer 2 from 2
Sending buffer 2 from 1
Receiving buffer from 1, buffer = hello from 1 for the 2 time
Receiving buffer from 2, buffer = hello from 2 for the 2 time
Receiving buffer from 3, buffer = hello from 3 for the 2 time
...

So, has this been reported before? I've seen some messages on the developer
list about hanging with the sm btl.

I'll post the test code if requested (this email is already long)

Mark


Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes

2009-06-19 Thread Eugene Loh

Mark Bolstad wrote:


I'll post the test code if requested (this email is already long)


Yipes, how long is the test code?  Short enough to send, yes?  Please send.


Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes

2009-06-19 Thread Mark Bolstad
Not that long, 150 lines.

Here it is:

#include 
#include 
#include 
#include 
#include 
#include 

#define BUFLEN 25000
#define LOOPS 10
#define BUFFERS 4

#define GROUP_SIZE 4

int main(int argc, char *argv[])
{
   int myid, numprocs, next, namelen;
   int color, key, newid;
   char buffer[BUFLEN], processor_name[MPI_MAX_PROCESSOR_NAME];
   MPI_Comm world_comm, comp_comm, server_comm;

   MPI_Init(&argc,&argv);
   MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
   MPI_Comm_rank(MPI_COMM_WORLD,&myid);
   MPI_Get_processor_name(processor_name,&namelen);

   MPI_Comm_dup( MPI_COMM_WORLD, &world_comm );

   if ( myid == 0 )
  color = MPI_UNDEFINED;
   else
   {
  color = (myid - 1) / GROUP_SIZE;
  key = (myid - 1) % GROUP_SIZE;
   }

   MPI_Comm_split( MPI_COMM_WORLD, color, key, &comp_comm );

   if ( myid == 0 || (myid - 1) % GROUP_SIZE == 0 )
  color = MPI_UNDEFINED;
   else
   {
  int r = myid - 2;

  color = 1;
  key = r - r / GROUP_SIZE;
   }

   MPI_Comm_split( MPI_COMM_WORLD, color, key, &server_comm );
   if ( myid == 0 )
   {
  fprintf(stderr,"Master [id = %d of %d] is running on %s\n", myid,
numprocs, processor_name);
   }
   else
   {
  int s_id;

  MPI_Comm_rank( comp_comm, &newid );
  if ( (myid - 1) % GROUP_SIZE == 0 )
 fprintf(stderr,"Compositor [id = %d, %d of %d] is running on %s\n",
myid, newid, numprocs, processor_name);
  else
  {
 MPI_Comm_rank( server_comm, &s_id );
 fprintf(stderr,"Server [id = %d, %d, %d of %d] is running on %s\n",
myid, newid, s_id, numprocs, processor_name);
  }
   }

   if ( myid == 0 )
  next = 10;

   MPI_Bcast( &next, 1, MPI_INT, 0, world_comm );
   fprintf(stderr,"[%d] next = %d\n", myid, next );

   if ( myid > 0 )
   {
  int i, j;
  int rank, size;
  MPI_Status status;

  MPI_Comm_size( comp_comm, &size );
  MPI_Comm_rank( comp_comm, &rank );

  if ( rank == 0 )
  {
 char buffer[BUFLEN];

 for (i = 0; i < LOOPS * ( size - 1 ); i++)
 {
int which_source, which_tag;

MPI_Probe( MPI_ANY_SOURCE, MPI_ANY_TAG, comp_comm, &status );
which_source = status.MPI_SOURCE;
which_tag = status.MPI_TAG;
printf( "Receiving buffer from %d, buffer = ", which_source );
MPI_Recv( buffer, BUFLEN, MPI_CHAR, which_source, which_tag,
comp_comm, &status );
printf( "%s\n", buffer );
 }
  }
  else
  {
 MPI_Request* request[BUFFERS];
 int sent[ BUFFERS ];
 int index = 0;
 char* buffer[BUFFERS];

 for (i = 0; i < BUFFERS; i++)
 {
MPI_Request* requester = (MPI_Request *) malloc( sizeof( MPI_Request
) );
char* c = (char *) malloc(BUFLEN * sizeof( MPI_Request ) );
/* Should really check for failure, but not for this test */
request[ i ] = requester;
sent[ i ] = 0;
buffer[ i ] = c;
 }

 for (i = 0; i < LOOPS; i++)
 {
printf( "Sending buffer %d from %d\n", i, rank );
sprintf( buffer[ index ], "hello from %d for the %d time", rank, i
);
if ( sent[ index ] )
{
   sent[ index ] = 0;
   MPI_Wait( request[ index ], &status );
}

MPI_Isend( buffer[ index ], BUFLEN, MPI_CHAR, 0, 99, comp_comm,
   request[ index ] );

sent[ index ] = 1;
index = ( index + 1 ) % BUFFERS;

/* Randomly sleep to fake a computation loop*/
usleep( (unsigned long)(50 * drand48()) );
 }

 /* Clean up */
 for (i = 0; i < BUFFERS; i++)
 {
if ( sent[ i ] )
{
   sent[ i ] = 0;
   MPI_Wait( request[ i ], &status );
}
free( request[ i ] );
free( buffer[ i ] );
 }
  }
   }

   MPI_Barrier( world_comm );
   MPI_Finalize();
   return (0);
}


On Fri, Jun 19, 2009 at 10:50 AM, Eugene Loh  wrote:

> Mark Bolstad wrote:
>
>  I'll post the test code if requested (this email is already long)
>>
>
> Yipes, how long is the test code?  Short enough to send, yes?  Please send.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Error in mx_init (error MX library incompatible with driver version)

2009-06-19 Thread SLIM H.A.
This is a question I raised before but for OpenMPI over IB.

I have build the app with the Portland compiler and OpenMPI 1.2.3 for
Myrinet and InfiniBand. Now I wish to run this on some nodes that have
no fast interconnect. We use GridEngine, this is the script:

#!/bin/csh
#$ -cwd
##$ -j y

module purge
module load dot sge openmpi/pgi/64/1.2.3

echo "Got slots"

mpirun -np $NSLOTS --mca btl "sm,self,tcp" ./t2eco2n_mp

This gives the following error message:

[node168:30330] Error in mx_init (error MX library incompatible with
driver version)
MX:driver-api-seq-num differ (lib=5.1,kernel=2.1)
MX Lib Version=1.2.5
MX Lib Build=dcl0hpc@hamilton:/tmp/dcl0hpc/myrinet/mx-1.2.5 Wed
Apr 16 10:48:48 BST 2008
MX Kernel Version=1.1.6
MX Kernel Build=root@node014:/tmp/mx-1.1.6 Fri Nov 24 13:41:44
GMT 2006
[node168:30331] Error in mx_init (error MX library incompatible with
driver version)
[node168:30330] *** Process received signal ***
[node168:30330] Signal: Segmentation fault (11)
[node168:30330] Signal code:  (128)
[node168:30330] Failing at address: (nil)
[node168:30330] *** End of error message ***


Although the mismatch between MX lib version and the kernel version
appears to cause the mx_init error this should never be called as there
is no mx card on those nodes.

Thanks in advance for any advice to solve this

Henk


Dr. H.A. Slim
IT Consultant, Scientific and High Performance Computing
IT Service,
Durham University, UK
e-mail: h.a.s...@durham.ac.uk
Tel.: 0191 - 334 2724
FAX: 0191 - 3342701 





Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes

2009-06-19 Thread Eugene Loh

Mark Bolstad wrote:

I have a small test code that I've managed to duplicate the results 
from a larger code. In essence, using the sm btl with ISend, I wind up 
with the communication being completely serialized, i.e., all the 
calls from process 1 complete, then all from 2, ...


I need to do some other stuff, but might spend time on this later.  For 
now, I'll just observe that your sends are rendezvous sends.  E.g., if 
you decrease BUFLEN from 25000 to 2500 (namely, from over 4K to under 
4K), the behavior should change (to what you'd expect).  That may or may 
not help you, but I think it's an important observation in reasoning 
about this behavior.


Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes

2009-06-19 Thread Mark Bolstad
Thanks, but that won't help. In the real application the messages are at
least 25,000 bytes long, mostly much larger.

Thanks,
Mark


On Fri, Jun 19, 2009 at 1:17 PM, Eugene Loh  wrote:

> Mark Bolstad wrote:
>
>  I have a small test code that I've managed to duplicate the results from a
>> larger code. In essence, using the sm btl with ISend, I wind up with the
>> communication being completely serialized, i.e., all the calls from process
>> 1 complete, then all from 2, ...
>>
>
> I need to do some other stuff, but might spend time on this later.  For
> now, I'll just observe that your sends are rendezvous sends.  E.g., if you
> decrease BUFLEN from 25000 to 2500 (namely, from over 4K to under 4K), the
> behavior should change (to what you'd expect).  That may or may not help
> you, but I think it's an important observation in reasoning about this
> behavior.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Linking MPI applications with pgi IPA

2009-06-19 Thread Brock Palen
When linking application that are being compiled and linked with the - 
Mipa=fast,inline option, the IPA stops with errors like this case  
with amber:


The following function(s) are called, but no IPA information is  
available:
mpi_allgatherv_, mpi_gatherv_, mpi_bcast_, mpi_wait_, mpi_get_count_,  
mpi_recv_, mpi_isend_, mpi_gather_, mpi_allreduce_, mpi_abort_,  
mpi_finalize_, mpi_send_

Linking without IPA

Is there a way to tell the compiler its ok to ignore the MPI library  
and do IPA for everything else?



Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985





[OMPI users] Machinefile option in opempi-1.3.2

2009-06-19 Thread Rajesh Sudarsan
Hi,

I tested a simple hello world program on 5 nodes each with dual
quad-core processors. I noticed that openmpi does not always follow
the order of the processors indicated in the machinefile. Depending
upon the number of processors requested, openmpi does some type of
sorting to find the best node fit for a particular job and runs on
them.  Is there a way to make openmpi to turn off this sorting and
strictly follow the order indicated in the machinefile?

mpiexec supports three options to specify the machinefile -
default-machinefile, hostfile, and machinefile. Can anyone tell what
is the difference between these three options?

Any help would be greatly appreciated.

Thanks,
Rajesh


Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes

2009-06-19 Thread George Bosilca

Mark,

MPI does not impose any global order on the messages. The only  
requirement is that between two peers on the same communicator the  
messages (or at least the part required for the matching) is delivered  
in order. This make both execution traces you sent with your original  
email (shared memory and TCP) valid from the MPI perspective.


Moreover, MPI doesn't impose any order in the matching when ANY_SOURCE  
is used. In Open MPI we do the matching _ALWAYS_ starting from rank 0  
to n in the specified communicator. BEWARE: The remaining of this  
paragraph is deep black magic of an MPI implementation internals. The  
main difference between the behavior of SM and TCP here directly  
reflect their eager size, 4K for SM and 64K for TCP. Therefore, for  
your example, for TCP all your messages are eager messages (i.e. are  
completely transfered to the destination process in just one go),  
while for SM they all require a rendez-vous. This directly impact the  
ordering of the messages on the receiver, and therefore the order of  
the matching. However, I have to insist on this, this behavior is  
correct based on the MPI standard specifications.


  george.



On Jun 19, 2009, at 13:28 , Mark Bolstad wrote:



Thanks, but that won't help. In the real application the messages  
are at least 25,000 bytes long, mostly much larger.


Thanks,
Mark


On Fri, Jun 19, 2009 at 1:17 PM, Eugene Loh   
wrote:

Mark Bolstad wrote:

I have a small test code that I've managed to duplicate the results  
from a larger code. In essence, using the sm btl with ISend, I wind  
up with the communication being completely serialized, i.e., all the  
calls from process 1 complete, then all from 2, ...


I need to do some other stuff, but might spend time on this later.   
For now, I'll just observe that your sends are rendezvous sends.   
E.g., if you decrease BUFLEN from 25000 to 2500 (namely, from over  
4K to under 4K), the behavior should change (to what you'd expect).   
That may or may not help you, but I think it's an important  
observation in reasoning about this behavior.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes

2009-06-19 Thread Eugene Loh

George Bosilca wrote:

MPI does not impose any global order on the messages. The only  
requirement is that between two peers on the same communicator the  
messages (or at least the part required for the matching) is 
delivered  in order. This make both execution traces you sent with 
your original  email (shared memory and TCP) valid from the MPI 
perspective.


Moreover, MPI doesn't impose any order in the matching when 
ANY_SOURCE  is used. In Open MPI we do the matching _ALWAYS_ starting 
from rank 0  to n in the specified communicator. BEWARE: The remaining 
of this  paragraph is deep black magic of an MPI implementation 
internals. The  main difference between the behavior of SM and TCP 
here directly  reflect their eager size, 4K for SM and 64K for TCP. 
Therefore, for  your example, for TCP all your messages are eager 
messages (i.e. are  completely transfered to the destination process 
in just one go),  while for SM they all require a rendez-vous. This 
directly impact the  ordering of the messages on the receiver, and 
therefore the order of  the matching. However, I have to insist on 
this, this behavior is  correct based on the MPI standard specifications.


I'm going to try a technical explanation of what's going on inside OMPI 
and then words of advice to Mark.


First, the technical explanation.  As George says, what's going on is 
legal.  The "servers" all queue up sends to the "compositor".  These are 
long, rendezvous sends (at least when they're on-node).  So, none of 
these sends completes.  The compositor looks for an in-coming message.  
It's gets the header of the message and sends back an acknowledgement 
that the rest of the message can be sent.  The "server" gets the 
acknowledgement and starts sending more of the message.  The compositor, 
in order to get to the remainder of the message, keeps draining all the 
other stuff servers are sending it.  Once the first message is 
completely received, the compositor looks for the next message to 
process and happens to pick up the first server again.  It won't go to 
anyone else under server 1 is exhausted.  Legal, but from Mark's point 
of view not desirable.  The compositor is busy all the time.  Mark just 
wants it to employ a different order.


The receives are "serialized".  Of course they must be since the 
receiver is a single process.  But Mark's performance issue is that the 
servers aren't being serviced equally.  So, they back up while server 
unfairly gets all the attention.


Mark, your test code has a set of buffers it cycles through on each 
server.  Could you do something similar on the compositor side?  Have a 
set of resources for each server.  If you want the compositor to service 
all servers equally/fairly, you're going to have to prescribe this 
behavior in your MPI code.  The MPI implementation can't be relied on to 
do this for you.


If this doesn't make sense, let me know and I'll try to sketch is out 
more explicitly.


Re: [OMPI users] mpirun fails on the host

2009-06-19 Thread Honest Guvnor
The source of the problem has been determined, but not wholly understood, by
fully disabling the firewall on the host to the internal network. Parallel
jobs involving the host and nodes launched from a node were successful while
those launched on the host were apparently blocked by the firewall. Would
the former only involve the use of the ssh port on the host while the latter
involve other ports?


Re: [OMPI users] Error in mx_init (error MX library incompatible with driver version)

2009-06-19 Thread Scott Atchley

On Jun 19, 2009, at 1:05 PM, SLIM H.A. wrote:


Although the mismatch between MX lib version and the kernel version
appears to cause the mx_init error this should never be called as  
there

is no mx card on those nodes.

Thanks in advance for any advice to solve this

Henk


Henk,

Is MX statically compiled into the binary or Open-MPI library?

Scott




Re: [OMPI users] Machinefile option in opempi-1.3.2

2009-06-19 Thread Ralph Castain
If you do "man orte_hosts", you'll see a full explanation of how the various
machinefile options work.
The default mapper doesn't do any type of sorting - it is a round-robin
mapper that just works its way through the provided nodes. We don't reorder
them in any way.

However, it does depend on the number of slots we are told each node has, so
that might be what you are encountering. If you do a --display-map and send
it along, I might be able to spot the issue.

Thanks
Ralph


On Fri, Jun 19, 2009 at 1:35 PM, Rajesh Sudarsan wrote:

> Hi,
>
> I tested a simple hello world program on 5 nodes each with dual
> quad-core processors. I noticed that openmpi does not always follow
> the order of the processors indicated in the machinefile. Depending
> upon the number of processors requested, openmpi does some type of
> sorting to find the best node fit for a particular job and runs on
> them.  Is there a way to make openmpi to turn off this sorting and
> strictly follow the order indicated in the machinefile?
>
> mpiexec supports three options to specify the machinefile -
> default-machinefile, hostfile, and machinefile. Can anyone tell what
> is the difference between these three options?
>
> Any help would be greatly appreciated.
>
> Thanks,
> Rajesh
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] mpirun fails on the host

2009-06-19 Thread Ralph Castain
I believe you will find a fairly complete discussion of firewall issues with
MPI on the OMPI mailing lists. Bottom line is that the firewall blocks both
the ssh port plus the TCP communication ports required to wireup the MPI
transports. If you are using the TCP transport, then those ports are also
blocked.
You can open specific ports in your firewall, and telling OMPI to use those
ports for both wireup and MPI transport. We don't necessarily recommend it,
though, as it leaves a security hole in your firewall.

HTH
Ralph



On Fri, Jun 19, 2009 at 4:00 PM, Honest Guvnor
wrote:

> The source of the problem has been determined, but not wholly understood,
> by fully disabling the firewall on the host to the internal network.
> Parallel jobs involving the host and nodes launched from a node were
> successful while those launched on the host were apparently blocked by the
> firewall. Would the former only involve the use of the ssh port on the host
> while the latter involve other ports?
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>