Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-24 Thread Gutierrez, Samuel K
ers-boun...@open-mpi.org> [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>] Sent: Monday, April 16, 2012 8:52 AM To: Seyyed Mohtadin Hashemi Cc: us...@open-mpi.org<mailto:us...@open-mpi.org> Subject

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-24 Thread Ralph Castain
x27;t due to a small backing store. >> >> Thanks, >> >> Sam >> >> On Apr 16, 2012, at 8:57 AM, Gutierrez, Samuel K wrote: >> >>> Hi, >>> >>> Sorry about the lag. I'll take a closer look at this ASAP. >>> >>

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-24 Thread Gutierrez, Samuel K
tadin Hashemi Cc: us...@open-mpi.org<mailto:us...@open-mpi.org> Subject: Re: [OMPI users] OpenMPI fails to run with -np larger than 10 No earthly idea. As I said, I'm afraid Sam is pretty much unavailable for the next two weeks, so we probably don't have much hope of fixing it. I see i

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-24 Thread Seyyed Mohtadin Hashemi
- > *From:* users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf > of Ralph Castain [r...@open-mpi.org] > *Sent:* Monday, April 16, 2012 8:52 AM > *To:* Seyyed Mohtadin Hashemi > *Cc:* us...@open-mpi.org > *Subject:* Re: [OMPI users] OpenMPI fails to run with

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-24 Thread Gutierrez, Samuel K
f of Ralph Castain [r...@open-mpi.org] Sent: Monday, April 16, 2012 8:52 AM To: Seyyed Mohtadin Hashemi Cc: us...@open-mpi.org<mailto:us...@open-mpi.org> Subject: Re: [OMPI users] OpenMPI fails to run with -np larger than 10 No earthly idea. As I said, I'm afraid Sam is pretty much unavail

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-17 Thread Jeffrey Squyres
Moving the conversation to this bug: https://svn.open-mpi.org/trac/ompi/ticket/3076 On Apr 16, 2012, at 4:57 AM, Seyyed Mohtadin Hashemi wrote: > I recompiled everything from scratch with GCC 4.4.5 and 4.7 using OMPI 1.4.5 > tarball. > > I did some tests and it does not seem that i can mak

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-16 Thread Gutierrez, Samuel K
tadin Hashemi Cc: us...@open-mpi.org Subject: Re: [OMPI users] OpenMPI fails to run with -np larger than 10 No earthly idea. As I said, I'm afraid Sam is pretty much unavailable for the next two weeks, so we probably don't have much hope of fixing it. I see in your original note that you

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-16 Thread Ralph Castain
No earthly idea. As I said, I'm afraid Sam is pretty much unavailable for the next two weeks, so we probably don't have much hope of fixing it. I see in your original note that you tried the 1.5.5 beta rc and got the same results, so I assume this must be something in your system config that is

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-16 Thread Seyyed Mohtadin Hashemi
I recompiled everything from scratch with GCC 4.4.5 and 4.7 using OMPI 1.4.5 tarball. I did some tests and it does not seem that i can make it work, i tried these: btl_sm_num_fifos 4 btl_sm_free_list_num 1000 btl_sm_free_list_max 100 mpool_sm_min_size 15 mpool_sm_max_size 75

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-16 Thread Seyyed Mohtadin Hashemi
I did try with both MaxSessions and MaxStartups set to 200, unfortunately it did not help - I still got the same errors as before. > Date: Sat, 14 Apr 2012 12:58:49 -0400 > From: Tim Miller > Subject: Re: [OMPI users] OpenMPI fails to run with -np larger than 10 > To: Open MPI User

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-14 Thread Tim Miller
This may or may not be related, but I've had similar issues on RHEL 6.x and clones when using the SSH job launcher and running more than 10 processes per node. It sounds like you're only distributing 6 processes per node, so it doesn't sound like your problem, but you might want to check your hostf

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-13 Thread Seyyed Mohtadin Hashemi
That fixed the issue but have brought a big question mark on why this happened. I'm pretty sure it's not a system memory issue, the node with least RAM has 8gb which i would think is more than enough. Do you think that adjusting the btl_sm_eager_limit, mpool_sm_min_size, and mpool_sm_max_size can

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-13 Thread Ralph Castain
Afraid I have no idea how those packages were built, what release they correspond to, etc. I would suggest sticking with the tarballs. Your output indicates a problem with shared memory when you completely fill the machine. Could be a couple of things, like running out of memory - but for now,

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-13 Thread Seyyed Mohtadin Hashemi
Hi, Sorry that it took so long to answer, I didn't get any return mails and had to check the digest for reply. Anyway, when i compiled from scratch then i did use the tarballs from open-mpi.org. GROMACS is not the problem (or at least i don't think so), i just used it as a check to see if i could

Re: [OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-12 Thread Ralph Castain
I suspect you'll have to ask someone familiar with GROMACS about that specific package. As for testing OMPI, can you run the codes in the examples directory - e.g., "hello" and "ring"? I assume you are downloading and installing OMPI from our tarballs? On Apr 12, 2012, at 7:04 AM, Seyyed Mohtad

[OMPI users] OpenMPI fails to run with -np larger than 10

2012-04-12 Thread Seyyed Mohtadin Hashemi
Hello, I have a very peculiar problem: I have a micro cluster with three nodes (18 cores total); the nodes are clones of each other and connected to a frontend via Ethernet and Debian squeeze as the OS for all nodes. When I run parallel jobs I can used up “-np 10” if I go further the job crashes,