Re: [OMPI users] mpirun fails on the host
On Fri, Jun 19, 2009 at 3:12 AM, Ralph Castain wrote: > Add --debug-devel to your cmd line and you'll get a bunch of diagnostic > info. Did you configure --enable-debug? If so, then additional debug can be > obtained - can let you know how to get it, if necessary. Yes we had run with the -d flag and it was the output from this that prompted us to find out how to prevent the use of the external network. I am not sure what most of the messages mean but we still get quite a few references to hankel.fred.com which the nodes will not be able to access. Here is the output (changed external ip numbers and domain): [cluster@hankel ~]$ mpirun --debug-devel --mca btl tcp,self --mca btl_tcp_if_exclude lo,eth0 --mca oob_tcp_if_exclude lo,eth0 -np 1 --host n06 hostname [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] connect_uni: connection not allowed [hankel.fred.com:26997] [0,0,0] setting up session dir with [hankel.fred.com:26997] universe default-universe-26997 [hankel.fred.com:26997] user cluster [hankel.fred.com:26997] host hankel.fred.com [hankel.fred.com:26997] jobid 0 [hankel.fred.com:26997] procid 0 [hankel.fred.com:26997] procdir: /tmp/openmpi-sessions-clus...@hankel.fred.com_0/default-universe-26997/0/0 [hankel.fred.com:26997] jobdir: /tmp/openmpi-sessions-clus...@hankel.fred.com_0/default-universe-26997/0 [hankel.fred.com:26997] unidir: /tmp/openmpi-sessions-clus...@hankel.fred.com_0/default-universe-26997 [hankel.fred.com:26997] top: openmpi-sessions-clus...@hankel.fred.com_0 [hankel.fred.com:26997] tmp: /tmp [hankel.fred.com:26997] [0,0,0] contact_file /tmp/openmpi-sessions-clus...@hankel.fred.com_0 /default-universe-26997/universe-setup.txt [hankel.fred.com:26997] [0,0,0] wrote setup file [hankel.fred.com:26997] pls:rsh: local csh: 0, local sh: 1 [hankel.fred.com:26997] pls:rsh: assuming same remote shell as local shell [hankel.fred.com:26997] pls:rsh: remote csh: 0, remote sh: 1 [hankel.fred.com:26997] pls:rsh: final template argv: [hankel.fred.com:26997] pls:rsh: /usr/bin/ssh orted --debug --bootproxy 1 --name --num_procs 2 --vpid_start 0 --nodename --universe clus...@hankel.fred.com:default-universe-26997 --nsreplica "0.0.0;tcp://192.168.0.99:54116" --gprreplica "0.0.0;tcp:// 192.168.0.99:54116" [hankel.fred.com:26997] pls:rsh: launching on node n06 [hankel.fred.com:26997] pls:rsh: n06 is a REMOTE node [hankel.fred.com:26997] pls:rsh: executing: (//usr/bin/ssh) /usr/bin/ssh n06 PATH=/usr/lib/openmpi/1.2.7-gcc/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/lib/openmpi/1.2.7-gcc/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; /usr/lib/openmpi/1.2.7-gcc/bin/orted --debug --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0 --nodename n06 --universe clus...@hankel.fred.com:default-universe-26997 --nsreplica "0.0.0;tcp:// 192.168.0.99:54116" --gprreplica "0.0.0;tcp://192.168.0.99:54116" [HOSTNAME= hankel.fred.com TERM=xterm-color SHELL=/bin/bash HISTSIZE=1000 SSH_CLIENT=130.149.86.77 50506 22 SSH_TTY=/dev/pts/12 USER=cluster LD_LIBRARY_PATH=:/usr/lib/openmpi/1.2.7-gcc/lib LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35: MAIL=/var/spool/mail/cluster PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/lib/openmpi/1.2.7-gcc/bin:/home/cluster/bin INPUTRC=/etc/inputrc PWD=/home/cluster LANG=en_US.UTF-8 SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass SHLVL=1 HOME=/home/cluster LOGNAME=cluster CVS_RSH=ssh SSH_CONNECTION=222.222.222.222 50506
Re: [OMPI users] vfs_write returned -14
On Jun 18, 2009, at 7:33 PM, Kritiraj Sajadah wrote: Hello Josh, ThanK you again for your respond. I tried chekpointing a simple c program using BLCR...and got the same error, i.e: - vfs_write returned -14 - file_header: write returned -14 Checkpoint failed: Bad address So I would look at how your NFS file system is setup, and work with your sysadmin (and maybe the BLCR list) to resolve this before experimenting too much with checkpointing with Open MPI. This is how i installed and run mpi programs for checkpointing: 1) configure and install blcr 2) configure and install openmpi 3) Compile and run mpi program as follows: 4) To checkpoint the running program, 5) To restart your checkpoint, locate the checkpoint file and type the following from the command line: This all looks ok to me. The did another test with BLCR however, I tried checkpointing my c application from the /tmp directory instead of my $HOME directory and it checkpointed fine. So, it looks like the problem is with my $HOME directory. I have "drwx" rights on my $HOME directory which seems fine for me. Then i tried it with open MPI. However, with open mpi the checkpoint file automatically get saved in the $HOME directory. Is there a way to have the file saved in a different location? I checked that LAM/MPI has some command line options : $ mpirun -np 2 -ssi cr_base_dir /somewhere/else a.out Do we have a similar option for open mpi? By default Open MPI places the global snapshot in the $HOME directory. But you can also specify a different directory for the global snapshot using the following MCA option: -mca snapc_base_global_snapshot_dir /somewhere/else For the best results you will likely want to set this in the MCA params file in your home directory: shell$ cat ~/.openmpi/mca-params.conf snapc_base_global_snapshot_dir=/somewhere/else You can also stage the file to local disk, then have Open MPI transfer the checkpoints back to a {logically} central storage device (both can be /tmp on a local disk if you like). For more details on this and the above option you will want to read through the FT Users Guide attached to the wiki page at the link below: https://svn.open-mpi.org/trac/ompi/wiki/ProcessFT_CR -- Josh Thanks a lot regards, Raj --- On Wed, 6/17/09, Josh Hursey wrote: From: Josh Hursey Subject: Re: [OMPI users] vfs_write returned -14 To: "Open MPI Users" Date: Wednesday, June 17, 2009, 1:42 AM Did you try checkpointing a non-MPI application with BLCR on the cluster? If that does not work then I would suspect that BLCR is not working properly on the system. However if a non-MPI application can be checkpointed and restarted correctly on this machine then it may be something odd with the Open MPI installation or runtime environment. To help debug here I would need to know how Open MPI was configured and how the application was ran on the machine (command line arguments, environment variables, ...). I should note that for the program that you sent it is important that you compile Open MPI with the Fault Tolerance Thread enabled to ensure a timely checkpoint. Otherwise the checkpoint will be delayed until the MPI program enters the MPI_Finalize function. Let me know what you find out. Josh On Jun 16, 2009, at 5:08 PM, Kritiraj Sajadah wrote: Hi Josh, Thanks for the email. I have install BLCR 0.8.1 and openmpi 1.3 on my laptop with Ubuntu 8.04 on it. It works fine. I now tried the installation on the cluster ( on one machine for now) in my university. ( the administrator installed it) i am not sure if he followed the steps i gave him. I am checkpointing a simple mpi application which looks as follows: #include #include int main(int argc, char **argv) { int rank,size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("I am processor no %d of a total of %d procs \n", rank, size); system("sleep 30"); printf("I am processor no %d of a total of %d procs \n", rank, size); system("sleep 30"); printf("I am processor no %d of a total of %d procs \n", rank, size); system("sleep 30"); printf("bye \n"); MPI_Finalize(); return 0; } Do you think its better to re install BLCR? Thanks Raj --- On Tue, 6/16/09, Josh Hursey wrote: From: Josh Hursey Subject: Re: [OMPI users] vfs_write returned -14 To: "Open MPI Users" Date: Tuesday, June 16, 2009, 6:42 PM These are errors from BLCR. It may be a problem with your BLCR installation and/or your application. Are you able to checkpoint/restart a non-MPI application with BLCR on these machines? What kind of MPI application are you trying to checkpoint? Some of the MPI interfaces are not fully supported at the moment (outlined in the FT User Document that I mentioned in a previous email). -- Josh On Jun 16, 2009, at 11:30 AM, Kritiraj Sajadah wrote: Dear All, I have install openmpi 1.3 and blcr 0.
[OMPI users] Bug in 1.3.2?: sm btl and isend is serializes
I have a small test code that I've managed to duplicate the results from a larger code. In essence, using the sm btl with ISend, I wind up with the communication being completely serialized, i.e., all the calls from process 1 complete, then all from 2, ... This is version 1.3.2, vanilla compile. I get the same results on my RHEL5 nehalem and an OS X laptop. Here's an example of the output (note: there is a usleep in the code to mimick my computation loop, and ensure that this is not a simple I/O sequencing issue): Ignore the "next" in the output below, it was a broadcast test. mpirun -np 5 ./mpi_split_test Master [id = 0 of 5] is running on bolstadm-lm1 [0] next = 10 Server [id = 3, 2, 1 of 5] is running on bolstadm-lm1 Compositor [id = 1, 0 of 5] is running on bolstadm-lm1 [1] next = 10 Sending buffer 0 from 1 Server [id = 2, 1, 0 of 5] is running on bolstadm-lm1 [2] next = 10 Sending buffer 0 from 2 [3] next = 10 Server [id = 4, 3, 2 of 5] is running on bolstadm-lm1 [4] next = 10 Sending buffer 0 from 3 Sending buffer 1 from 1 Sending buffer 1 from 2 Sending buffer 1 from 3 Sending buffer 2 from 1 Sending buffer 2 from 2 Sending buffer 2 from 3 Sending buffer 3 from 1 Sending buffer 3 from 2 Sending buffer 4 from 1 Receiving buffer from 1, buffer = hello from 1 for the 0 time Receiving buffer from 1, buffer = hello from 1 for the 1 time Sending buffer 4 from 2 Sending buffer 4 from 3 Sending buffer 5 from 1 Receiving buffer from 1, buffer = hello from 1 for the 2 time Sending buffer 6 from 1 Receiving buffer from 1, buffer = hello from 1 for the 3 time -At this point, processes 2 & 3 are stuck in an MPI_Wait ... Sending buffer 9 from 1 Receiving buffer from 1, buffer = hello from 1 for the 6 time Receiving buffer from 1, buffer = hello from 1 for the 7 time Receiving buffer from 1, buffer = hello from 1 for the 8 time Receiving buffer from 1, buffer = hello from 1 for the 9 time Receiving buffer from 2, buffer = hello from 2 for the 0 time Receiving buffer from 2, buffer = hello from 2 for the 1 time Receiving buffer from 2, buffer = hello from 2 for the 2 time Sending buffer 5 from 2 Sending buffer 6 from 2 Receiving buffer from 2, buffer = hello from 2 for the 3 time Now process 2 is now running, 1 is in a barrier, 3 is still in Wait Sending buffer 9 from 2 Receiving buffer from 2, buffer = hello from 2 for the 6 time Receiving buffer from 2, buffer = hello from 2 for the 7 time Receiving buffer from 2, buffer = hello from 2 for the 8 time Receiving buffer from 2, buffer = hello from 2 for the 9 time Receiving buffer from 3, buffer = hello from 3 for the 0 time Sending buffer 5 from 3 Receiving buffer from 3, buffer = hello from 3 for the 1 time Receiving buffer from 3, buffer = hello from 3 for the 2 time And now process 3 goes ... Receiving buffer from 3, buffer = hello from 3 for the 8 time Receiving buffer from 3, buffer = hello from 3 for the 9 time Now running under TCP: mpirun --mca btl tcp,self -np 5 ./mpi_split_test Compositor [id = 1, 0 of 5] is running on bolstadm-lm1 Master [id = 0 of 5] is running on bolstadm-lm1 [0] next = 10 Server [id = 2, 1, 0 of 5] is running on bolstadm-lm1 Server [id = 3, 2, 1 of 5] is running on bolstadm-lm1 Server [id = 4, 3, 2 of 5] is running on bolstadm-lm1 [4] next = 10 Sending buffer 0 from 3 Sending buffer 0 from 1 [2] next = 10 [1] next = 10 Sending buffer 0 from 2 [3] next = 10 Receiving buffer from 1, buffer = hello from 1 for the 0 time Receiving buffer from 3, buffer = hello from 3 for the 0 time Receiving buffer from 2, buffer = hello from 2 for the 0 time Sending buffer 1 from 3 Sending buffer 1 from 1 Sending buffer 1 from 2 Receiving buffer from 1, buffer = hello from 1 for the 1 time Receiving buffer from 2, buffer = hello from 2 for the 1 time Receiving buffer from 3, buffer = hello from 3 for the 1 time Sending buffer 2 from 3 Sending buffer 2 from 2 Sending buffer 2 from 1 Receiving buffer from 1, buffer = hello from 1 for the 2 time Receiving buffer from 2, buffer = hello from 2 for the 2 time Receiving buffer from 3, buffer = hello from 3 for the 2 time ... So, has this been reported before? I've seen some messages on the developer list about hanging with the sm btl. I'll post the test code if requested (this email is already long) Mark
Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes
Mark Bolstad wrote: I'll post the test code if requested (this email is already long) Yipes, how long is the test code? Short enough to send, yes? Please send.
Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes
Not that long, 150 lines. Here it is: #include #include #include #include #include #include #define BUFLEN 25000 #define LOOPS 10 #define BUFFERS 4 #define GROUP_SIZE 4 int main(int argc, char *argv[]) { int myid, numprocs, next, namelen; int color, key, newid; char buffer[BUFLEN], processor_name[MPI_MAX_PROCESSOR_NAME]; MPI_Comm world_comm, comp_comm, server_comm; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(processor_name,&namelen); MPI_Comm_dup( MPI_COMM_WORLD, &world_comm ); if ( myid == 0 ) color = MPI_UNDEFINED; else { color = (myid - 1) / GROUP_SIZE; key = (myid - 1) % GROUP_SIZE; } MPI_Comm_split( MPI_COMM_WORLD, color, key, &comp_comm ); if ( myid == 0 || (myid - 1) % GROUP_SIZE == 0 ) color = MPI_UNDEFINED; else { int r = myid - 2; color = 1; key = r - r / GROUP_SIZE; } MPI_Comm_split( MPI_COMM_WORLD, color, key, &server_comm ); if ( myid == 0 ) { fprintf(stderr,"Master [id = %d of %d] is running on %s\n", myid, numprocs, processor_name); } else { int s_id; MPI_Comm_rank( comp_comm, &newid ); if ( (myid - 1) % GROUP_SIZE == 0 ) fprintf(stderr,"Compositor [id = %d, %d of %d] is running on %s\n", myid, newid, numprocs, processor_name); else { MPI_Comm_rank( server_comm, &s_id ); fprintf(stderr,"Server [id = %d, %d, %d of %d] is running on %s\n", myid, newid, s_id, numprocs, processor_name); } } if ( myid == 0 ) next = 10; MPI_Bcast( &next, 1, MPI_INT, 0, world_comm ); fprintf(stderr,"[%d] next = %d\n", myid, next ); if ( myid > 0 ) { int i, j; int rank, size; MPI_Status status; MPI_Comm_size( comp_comm, &size ); MPI_Comm_rank( comp_comm, &rank ); if ( rank == 0 ) { char buffer[BUFLEN]; for (i = 0; i < LOOPS * ( size - 1 ); i++) { int which_source, which_tag; MPI_Probe( MPI_ANY_SOURCE, MPI_ANY_TAG, comp_comm, &status ); which_source = status.MPI_SOURCE; which_tag = status.MPI_TAG; printf( "Receiving buffer from %d, buffer = ", which_source ); MPI_Recv( buffer, BUFLEN, MPI_CHAR, which_source, which_tag, comp_comm, &status ); printf( "%s\n", buffer ); } } else { MPI_Request* request[BUFFERS]; int sent[ BUFFERS ]; int index = 0; char* buffer[BUFFERS]; for (i = 0; i < BUFFERS; i++) { MPI_Request* requester = (MPI_Request *) malloc( sizeof( MPI_Request ) ); char* c = (char *) malloc(BUFLEN * sizeof( MPI_Request ) ); /* Should really check for failure, but not for this test */ request[ i ] = requester; sent[ i ] = 0; buffer[ i ] = c; } for (i = 0; i < LOOPS; i++) { printf( "Sending buffer %d from %d\n", i, rank ); sprintf( buffer[ index ], "hello from %d for the %d time", rank, i ); if ( sent[ index ] ) { sent[ index ] = 0; MPI_Wait( request[ index ], &status ); } MPI_Isend( buffer[ index ], BUFLEN, MPI_CHAR, 0, 99, comp_comm, request[ index ] ); sent[ index ] = 1; index = ( index + 1 ) % BUFFERS; /* Randomly sleep to fake a computation loop*/ usleep( (unsigned long)(50 * drand48()) ); } /* Clean up */ for (i = 0; i < BUFFERS; i++) { if ( sent[ i ] ) { sent[ i ] = 0; MPI_Wait( request[ i ], &status ); } free( request[ i ] ); free( buffer[ i ] ); } } } MPI_Barrier( world_comm ); MPI_Finalize(); return (0); } On Fri, Jun 19, 2009 at 10:50 AM, Eugene Loh wrote: > Mark Bolstad wrote: > > I'll post the test code if requested (this email is already long) >> > > Yipes, how long is the test code? Short enough to send, yes? Please send. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] Error in mx_init (error MX library incompatible with driver version)
This is a question I raised before but for OpenMPI over IB. I have build the app with the Portland compiler and OpenMPI 1.2.3 for Myrinet and InfiniBand. Now I wish to run this on some nodes that have no fast interconnect. We use GridEngine, this is the script: #!/bin/csh #$ -cwd ##$ -j y module purge module load dot sge openmpi/pgi/64/1.2.3 echo "Got slots" mpirun -np $NSLOTS --mca btl "sm,self,tcp" ./t2eco2n_mp This gives the following error message: [node168:30330] Error in mx_init (error MX library incompatible with driver version) MX:driver-api-seq-num differ (lib=5.1,kernel=2.1) MX Lib Version=1.2.5 MX Lib Build=dcl0hpc@hamilton:/tmp/dcl0hpc/myrinet/mx-1.2.5 Wed Apr 16 10:48:48 BST 2008 MX Kernel Version=1.1.6 MX Kernel Build=root@node014:/tmp/mx-1.1.6 Fri Nov 24 13:41:44 GMT 2006 [node168:30331] Error in mx_init (error MX library incompatible with driver version) [node168:30330] *** Process received signal *** [node168:30330] Signal: Segmentation fault (11) [node168:30330] Signal code: (128) [node168:30330] Failing at address: (nil) [node168:30330] *** End of error message *** Although the mismatch between MX lib version and the kernel version appears to cause the mx_init error this should never be called as there is no mx card on those nodes. Thanks in advance for any advice to solve this Henk Dr. H.A. Slim IT Consultant, Scientific and High Performance Computing IT Service, Durham University, UK e-mail: h.a.s...@durham.ac.uk Tel.: 0191 - 334 2724 FAX: 0191 - 3342701
Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes
Mark Bolstad wrote: I have a small test code that I've managed to duplicate the results from a larger code. In essence, using the sm btl with ISend, I wind up with the communication being completely serialized, i.e., all the calls from process 1 complete, then all from 2, ... I need to do some other stuff, but might spend time on this later. For now, I'll just observe that your sends are rendezvous sends. E.g., if you decrease BUFLEN from 25000 to 2500 (namely, from over 4K to under 4K), the behavior should change (to what you'd expect). That may or may not help you, but I think it's an important observation in reasoning about this behavior.
Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes
Thanks, but that won't help. In the real application the messages are at least 25,000 bytes long, mostly much larger. Thanks, Mark On Fri, Jun 19, 2009 at 1:17 PM, Eugene Loh wrote: > Mark Bolstad wrote: > > I have a small test code that I've managed to duplicate the results from a >> larger code. In essence, using the sm btl with ISend, I wind up with the >> communication being completely serialized, i.e., all the calls from process >> 1 complete, then all from 2, ... >> > > I need to do some other stuff, but might spend time on this later. For > now, I'll just observe that your sends are rendezvous sends. E.g., if you > decrease BUFLEN from 25000 to 2500 (namely, from over 4K to under 4K), the > behavior should change (to what you'd expect). That may or may not help > you, but I think it's an important observation in reasoning about this > behavior. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] Linking MPI applications with pgi IPA
When linking application that are being compiled and linked with the - Mipa=fast,inline option, the IPA stops with errors like this case with amber: The following function(s) are called, but no IPA information is available: mpi_allgatherv_, mpi_gatherv_, mpi_bcast_, mpi_wait_, mpi_get_count_, mpi_recv_, mpi_isend_, mpi_gather_, mpi_allreduce_, mpi_abort_, mpi_finalize_, mpi_send_ Linking without IPA Is there a way to tell the compiler its ok to ignore the MPI library and do IPA for everything else? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985
[OMPI users] Machinefile option in opempi-1.3.2
Hi, I tested a simple hello world program on 5 nodes each with dual quad-core processors. I noticed that openmpi does not always follow the order of the processors indicated in the machinefile. Depending upon the number of processors requested, openmpi does some type of sorting to find the best node fit for a particular job and runs on them. Is there a way to make openmpi to turn off this sorting and strictly follow the order indicated in the machinefile? mpiexec supports three options to specify the machinefile - default-machinefile, hostfile, and machinefile. Can anyone tell what is the difference between these three options? Any help would be greatly appreciated. Thanks, Rajesh
Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes
Mark, MPI does not impose any global order on the messages. The only requirement is that between two peers on the same communicator the messages (or at least the part required for the matching) is delivered in order. This make both execution traces you sent with your original email (shared memory and TCP) valid from the MPI perspective. Moreover, MPI doesn't impose any order in the matching when ANY_SOURCE is used. In Open MPI we do the matching _ALWAYS_ starting from rank 0 to n in the specified communicator. BEWARE: The remaining of this paragraph is deep black magic of an MPI implementation internals. The main difference between the behavior of SM and TCP here directly reflect their eager size, 4K for SM and 64K for TCP. Therefore, for your example, for TCP all your messages are eager messages (i.e. are completely transfered to the destination process in just one go), while for SM they all require a rendez-vous. This directly impact the ordering of the messages on the receiver, and therefore the order of the matching. However, I have to insist on this, this behavior is correct based on the MPI standard specifications. george. On Jun 19, 2009, at 13:28 , Mark Bolstad wrote: Thanks, but that won't help. In the real application the messages are at least 25,000 bytes long, mostly much larger. Thanks, Mark On Fri, Jun 19, 2009 at 1:17 PM, Eugene Loh wrote: Mark Bolstad wrote: I have a small test code that I've managed to duplicate the results from a larger code. In essence, using the sm btl with ISend, I wind up with the communication being completely serialized, i.e., all the calls from process 1 complete, then all from 2, ... I need to do some other stuff, but might spend time on this later. For now, I'll just observe that your sends are rendezvous sends. E.g., if you decrease BUFLEN from 25000 to 2500 (namely, from over 4K to under 4K), the behavior should change (to what you'd expect). That may or may not help you, but I think it's an important observation in reasoning about this behavior. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Bug in 1.3.2?: sm btl and isend is serializes
George Bosilca wrote: MPI does not impose any global order on the messages. The only requirement is that between two peers on the same communicator the messages (or at least the part required for the matching) is delivered in order. This make both execution traces you sent with your original email (shared memory and TCP) valid from the MPI perspective. Moreover, MPI doesn't impose any order in the matching when ANY_SOURCE is used. In Open MPI we do the matching _ALWAYS_ starting from rank 0 to n in the specified communicator. BEWARE: The remaining of this paragraph is deep black magic of an MPI implementation internals. The main difference between the behavior of SM and TCP here directly reflect their eager size, 4K for SM and 64K for TCP. Therefore, for your example, for TCP all your messages are eager messages (i.e. are completely transfered to the destination process in just one go), while for SM they all require a rendez-vous. This directly impact the ordering of the messages on the receiver, and therefore the order of the matching. However, I have to insist on this, this behavior is correct based on the MPI standard specifications. I'm going to try a technical explanation of what's going on inside OMPI and then words of advice to Mark. First, the technical explanation. As George says, what's going on is legal. The "servers" all queue up sends to the "compositor". These are long, rendezvous sends (at least when they're on-node). So, none of these sends completes. The compositor looks for an in-coming message. It's gets the header of the message and sends back an acknowledgement that the rest of the message can be sent. The "server" gets the acknowledgement and starts sending more of the message. The compositor, in order to get to the remainder of the message, keeps draining all the other stuff servers are sending it. Once the first message is completely received, the compositor looks for the next message to process and happens to pick up the first server again. It won't go to anyone else under server 1 is exhausted. Legal, but from Mark's point of view not desirable. The compositor is busy all the time. Mark just wants it to employ a different order. The receives are "serialized". Of course they must be since the receiver is a single process. But Mark's performance issue is that the servers aren't being serviced equally. So, they back up while server unfairly gets all the attention. Mark, your test code has a set of buffers it cycles through on each server. Could you do something similar on the compositor side? Have a set of resources for each server. If you want the compositor to service all servers equally/fairly, you're going to have to prescribe this behavior in your MPI code. The MPI implementation can't be relied on to do this for you. If this doesn't make sense, let me know and I'll try to sketch is out more explicitly.
Re: [OMPI users] mpirun fails on the host
The source of the problem has been determined, but not wholly understood, by fully disabling the firewall on the host to the internal network. Parallel jobs involving the host and nodes launched from a node were successful while those launched on the host were apparently blocked by the firewall. Would the former only involve the use of the ssh port on the host while the latter involve other ports?
Re: [OMPI users] Error in mx_init (error MX library incompatible with driver version)
On Jun 19, 2009, at 1:05 PM, SLIM H.A. wrote: Although the mismatch between MX lib version and the kernel version appears to cause the mx_init error this should never be called as there is no mx card on those nodes. Thanks in advance for any advice to solve this Henk Henk, Is MX statically compiled into the binary or Open-MPI library? Scott
Re: [OMPI users] Machinefile option in opempi-1.3.2
If you do "man orte_hosts", you'll see a full explanation of how the various machinefile options work. The default mapper doesn't do any type of sorting - it is a round-robin mapper that just works its way through the provided nodes. We don't reorder them in any way. However, it does depend on the number of slots we are told each node has, so that might be what you are encountering. If you do a --display-map and send it along, I might be able to spot the issue. Thanks Ralph On Fri, Jun 19, 2009 at 1:35 PM, Rajesh Sudarsan wrote: > Hi, > > I tested a simple hello world program on 5 nodes each with dual > quad-core processors. I noticed that openmpi does not always follow > the order of the processors indicated in the machinefile. Depending > upon the number of processors requested, openmpi does some type of > sorting to find the best node fit for a particular job and runs on > them. Is there a way to make openmpi to turn off this sorting and > strictly follow the order indicated in the machinefile? > > mpiexec supports three options to specify the machinefile - > default-machinefile, hostfile, and machinefile. Can anyone tell what > is the difference between these three options? > > Any help would be greatly appreciated. > > Thanks, > Rajesh > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] mpirun fails on the host
I believe you will find a fairly complete discussion of firewall issues with MPI on the OMPI mailing lists. Bottom line is that the firewall blocks both the ssh port plus the TCP communication ports required to wireup the MPI transports. If you are using the TCP transport, then those ports are also blocked. You can open specific ports in your firewall, and telling OMPI to use those ports for both wireup and MPI transport. We don't necessarily recommend it, though, as it leaves a security hole in your firewall. HTH Ralph On Fri, Jun 19, 2009 at 4:00 PM, Honest Guvnor wrote: > The source of the problem has been determined, but not wholly understood, > by fully disabling the firewall on the host to the internal network. > Parallel jobs involving the host and nodes launched from a node were > successful while those launched on the host were apparently blocked by the > firewall. Would the former only involve the use of the ssh port on the host > while the latter involve other ports? > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >