Re: [OMPI users] Signal 13

2007-03-20 Thread Jeff Squyres
FWIW, most LDAP installations I have seen have ended up doing the same thing -- if you have a large enough cluster, you have MPI jobs starting all the time, and rate control of a single job startup is not sufficient to avoid overloading your LDAP server. The solutions that I have seen typic

Re: [OMPI users] Signal 13

2007-03-18 Thread David Bronke
That's great to hear! For now we'll just create local users for those who need access to MPI on this system, but I'll keep an eye on the list for when you do get a chance to finish that fix. Thanks again! On 3/18/07, Ralph Castain wrote: Excellent! Yes, we use pipe in several places, including

Re: [OMPI users] Signal 13

2007-03-18 Thread Ralph Castain
Excellent! Yes, we use pipe in several places, including in the run-time during various stages of launch, so that could be a problem. Also, be aware that other users have reported problems on LDAP-based systems when attempting to launch large jobs. The problem is that the OpenMPI launch system has

Re: [OMPI users] Signal 13

2007-03-18 Thread David Bronke
I just received an email from a friend who is helping me work on resolving this; he was able to trace the problem back to a pipe() call in OpenMPI apparently: The problem is with the pipe() system call (which is invoked by the MPI_Send() as far as I can tell) by a LDAP authenticated user. Still

Re: [OMPI users] Signal 13

2007-03-16 Thread Ralph Castain
I'm afraid I have zero knowledge or experience with gentoo portage, so I can't help you there. I always install our releases from the tarball source as it is pretty trivial to do and avoids any issues. I will have to defer to someone who knows that system to help you from here. It sounds like an i

Re: [OMPI users] Signal 13

2007-03-16 Thread David Bronke
On 3/15/07, Ralph Castain wrote: Hmmm...well, a few thoughts to hopefully help with the debugging. One initial comment, though - 1.1.2 is quite old. You might want to upgrade to 1.2 (releasing momentarily - you can use the last release candidate in the interim as it is identical). Version 1.2

Re: [OMPI users] Signal 13

2007-03-15 Thread Ralph Castain
Hmmm...well, a few thoughts to hopefully help with the debugging. One initial comment, though - 1.1.2 is quite old. You might want to upgrade to 1.2 (releasing momentarily - you can use the last release candidate in the interim as it is identical). Meantime, looking at this output, there appear to

Re: [OMPI users] Signal 13

2007-03-15 Thread David Bronke
I'm using OpenMPI version 1.1.2. I installed it using gentoo portage, so I think it has the right permissions... I tried doing 'equery f openmpi | xargs ls -dl' and inspecting the permissions of each file, and I don't see much out of the ordinary; it is all owned by root:root, but every file has r

Re: [OMPI users] Signal 13

2007-03-15 Thread Ralph H Castain
It isn't a /dev issue. The problem is likely that the system lacks sufficient permissions to either: 1. create the Open MPI session directory tree. We create a hierarchy of subdirectories for temporary storage used for things like your shared memory file - the location of the head of that tree can

Re: [OMPI users] Signal 13

2007-03-15 Thread David Bronke
Ok, now that I've figured out what the signal means, I'm wondering exactly what is running into permission problems... the program I'm running doesn't use any functions except printf, sprintf, and MPI_*... I was thinking that possibly changes to permissions on certain /dev entries in newer distros

Re: [OMPI users] Signal 13

2007-03-15 Thread McCalla, Mac
Hi, If the perror command is available on your system it will tell you what the message is associated with the signal value. On my system RHEL4U3, it is permission denied. HTH, mac mccalla -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] O

Re: [OMPI users] Signal 13

2007-03-15 Thread Mike Houston
I've been having similar issues with brand new FC5/6 and RHEL5 machines, but our FC4/RHEL4 machines are just fine. On the FC5/6 RHEL5 machines, I can get things to run as root. There must be some ACL or security setting issue that's enabled by default on the newer distros. If I figure it out