Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Ralph H Castain
Well, I can't say for sure about LDAP. I did a quick search and found two things: 1. there are limits imposed in LDAP that may apply to your situation, and 2. that statement varies tremendously depending upon the specific LDAP implementation you are using I would suggest you see which LDAP you a

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Heywood, Todd
Hi Ralph, Thanks for the reply. This is a tough one. It is OpenLDAP. I had thought that I might be hitting a file descriptor limit for slapd (LDAP daemon), which ulimit -n does not effect (you have to rebuild LDAP with a different FD_SETSIZE variable). However, I simply turned on more expressiv

Re: [OMPI users] MPI_Type_create_subarray fails!

2007-02-06 Thread Avishay Traeger
Surely there is a better way to get this code running without disabling checks. Any suggestions? Thanks, Avishay On Mon, 2007-02-05 at 15:36 -0500, Ivan de Jesus Deras Tabora wrote: > I managed to make it run by disabling the parameter checking. > I added --mca mpi_param_check 0 to mpirun and it

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Ralph H Castain
It sounds to me like we are probably overwhelming your slapd - your test would seem to indicate that slowing down the slapd makes us fail even with smaller jobs, which tends to support that idea. We frankly haven't encountered that before since our rsh tests have all been done using non-LDAP authe

Re: [OMPI users] [OMPI Users] OpenMPI 1.1.4 over ethernet fails

2007-02-06 Thread Jeff Squyres
On Feb 2, 2007, at 11:22 AM, Alex Tumanov wrote: That really did fix it, George: # mpirun --prefix $MPIHOME -hostfile ~/testdir/hosts --mca btl tcp,self --mca btl_tcp_if_exclude ib0,ib1 ~/testdir/hello Hello from Alex' MPI test program Process 0 on dr11.lsf.platform.com out of 2 Hello from Alex

Re: [OMPI users] MPI_Type_create_subarray fails!

2007-02-06 Thread George Bosilca
A correction has been made to the MPI_Type_create_subarray. The particular test that was failing for you has been replaced with a better one. You can grab it either from the nightly build or in few days from the next (1.2) release. Thanks, george. On Feb 6, 2007, at 9:52 AM, Avishay

Re: [OMPI users] running OpenMPI jobs over Myrinet gm interconnect

2007-02-06 Thread Alex Tumanov
Thank you for your reply, Reese! What version of GM are you running? # rpm -qa |egrep "^gm-[0-9]+|^gm-devel" gm-2.0.24-1 gm-devel-2.0.24-1 Is this too old? And are you sure that gm_board_info shows all the nodes that are listed in your machine file? Yes, that was the issue - bad cable connec

[OMPI users] Problems with MPI_Init

2007-02-06 Thread Pablo Hernán Rodríguez Zivic
Hello everyone, I'm using MPI (ParMetis) on an 64 bits machine. When I tried to test it using the example programs it hangs with an error message wich says that I must change the device to ch_p4mpd. So, once I change it on the file mpirun.ch4 the application starts and hangs (never returns)

Re: [OMPI users] Problems with MPI_Init

2007-02-06 Thread Jeff Squyres
Greetings Pablo. Please note that this list is for support of the Open MPI software package. From the output you included, it looks like you are not using Open MPI, but are rather using one of the MPICH variants (i.e., a different software package). You might want to send your question t

Re: [OMPI users] running OpenMPI jobs over Myrinet gm interconnect

2007-02-06 Thread Reese Faucette
What version of GM are you running? # rpm -qa |egrep "^gm-[0-9]+|^gm-devel" gm-2.0.24-1 gm-devel-2.0.24-1 Is this too old? Nope, that's just fine. A mismatch between the list of nodes actually configured onto the Myrinet fabric and the machine file is a common source of errors like this. T

Re: [OMPI users] [OMPI Users] OpenMPI 1.1.4 over ethernet fails

2007-02-06 Thread Alex Tumanov
Thanks for your reply, Jeff. > It never occurred to me that the headnode would try to communicate > with the slave using infiniband interfaces... Orthogonally, what are The problem here is that since your IB IP addresses are "public" (meaning that they're not in the IETF defined ranges for priv

Re: [OMPI users] [OMPI Users] OpenMPI 1.1.4 over ethernet fails

2007-02-06 Thread Jeff Squyres
On Feb 6, 2007, at 12:38 PM, Alex Tumanov wrote: http://www.open-mpi.org/faq/?category=tcp#tcp-routability The pointer was rather informative. We do have to use non-standard ranges for IB interfaces, because we're performing automatic IP over IB configuration based on the eth0 IP and netmask.

Re: [OMPI users] MPI_Type_create_subarray fails!

2007-02-06 Thread Jeff Squyres
FWIW, this has been committed on the 1.1 branch. So if we ever do a 1.1.5 release, it will be included. On Feb 6, 2007, at 10:46 AM, George Bosilca wrote: A correction has been made to the MPI_Type_create_subarray. The particular test that was failing for you has been replaced with a better

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Heywood, Todd
Hi Ralph, It looks that way. I created a user local to each node, with local authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up just fine for that. I know this is an OpenMPI list, but does anyone know how common or uncommon LDAP-based clusters are? I would have thought this

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Ralph Castain
Hi Todd Just as a thought - you could try not using --debug-daemons or -d and instead setting "-mca pls_rsh_num_concurrent 50" or some such small number. This will tell the system to launch 50 ssh calls at a time, waiting for each group to complete before launching the next. You can't use it with