Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Ralph H Castain
Well, I can't say for sure about LDAP. I did a quick search and found two
things:

1. there are limits imposed in LDAP that may apply to your situation, and

2. that statement varies tremendously depending upon the specific LDAP
implementation you are using

I would suggest you see which LDAP you are using and contact the respective
organization to ask if they do have such a limit, and if so, how to adjust
it.

It sounds like maybe we are hitting the LDAP server with too many requests
too rapidly. Usually, the issue is not starting fast enough, so this is a
new one! We don't currently check to see if everything started up okay, so
that is why the processes might hang - we hope to fix that soon. I'll have
to see if there is something we can do to help alleviate such problems -
might not be in time for the 1.2 release, but perhaps it will make a
subsequent "fix" or, if you are willing/interested, I could provide it to
you as a "patch" you could use until a later official release.

Meantime, you might try upgrading to 1.2b3 or even a nightly release from
the trunk. There are known problems with 1.2b2 (which is why there is a b3
and soon to be an rc1), though I don't think that will be the problem here.
At the least, the nightly trunk has a much better response to ctrl-c in it.

Ralph


On 2/5/07 9:50 AM, "Heywood, Todd"  wrote:

> Hi Ralph,
>  
> Thanks for the reply. The OpenMPI version is 1.2b2 (because I would like to
> integrate it with SGE).
>  
> Here is what is happening:
>  
> (1) When I run with ­debug-daemons (but WITHOUT ­d), I get ³Daemon
> [0,0,27] checking in as pid 7620 on host blade28² (for example) messages for
> most but not all of the daemons that should be started up, and then it hangs.
> I also notice ³reconnecting to LDAP server² messages in various
> /var/log/secure files, and cannot login while things are hung (with ³su:
> pam_ldap: ldap_result Can't contact LDAP server² in /var/log/messages). So
> apparently LDAP hits some limit to opening ssh sessions, and I¹m not sure how
> to address this.
> (2) When I run with ­debug-daemons AND the debug option ­d, all daemons
> start start up and check-in, albeit slowly (debug must slow things down so
> LDAP can handle all the requests??). Then apparently, the cpi process is
> started for each task but it then hangs:
>  
> [blade1:23816] spawn: in job_state_callback(jobid = 1, state = 0x4)
> [blade1:23816] Info: Setting up debugger process table for applications
>  MPIR_being_debugged = 0
>  MPIR_debug_gate = 0
>  MPIR_debug_state = 1
>  MPIR_acquired_pre_main = 0
>  MPIR_i_am_starter = 0
>  MPIR_proctable_size = 800
>  MPIR_proctable:
>(i, host, exe, pid) = (0, blade1, /home4/itstaff/heywood/ompi/cpi, 24193)
> Š
> Š(i, host, exe, pid) = (799, blade213, /home4/itstaff/heywood/ompi/cpi, 4762)
>  
> A ³ps² on the head node shows 200 open ssh sessions, and 4 cpi processes doing
> nothing. A ^C gives this:
>  
> mpirun: killing job...
>  
> --
> WARNING: A process refused to die!
>  
> Host: blade1
> PID:  24193
>  
> This process may still be running and/or consuming resources.
> 
>  
>  
>  
> Still got a ways to go, but any ideas/suggestions are welcome!
>  
> Thanks,
>  
> Todd
>  
>  
> 
> 
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
> Of Ralph Castain
> Sent: Friday, February 02, 2007 5:20 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] large jobs hang on startup (deadlock?)
>  
> Hi Todd
> 
> To help us provide advice, could you tell us what version of OpenMPI you are
> using?
> 
> Meantime, try adding ³-mca pls_rsh_num_concurrent 200² to your mpirun command
> line. You can up the number of concurrent daemons we launch to anything your
> system will support ­ basically, we limit the number only because some systems
> have limits on the number of ssh calls we can have active at any one time.
> Because we hold stdio open when running with ‹debug-daemons, the number of
> concurrent daemons must match or exceed the number of nodes you are trying to
> launch on.
> 
> I have a ³fix² in the works that will help relieve some of that restriction,
> but that won¹t come out until a later release.
> 
> Hopefully, that will allow you to obtain more debug info about why/where
> things are hanging.
> 
> Ralph
> 
> 
> On 2/2/07 11:41 AM, "Heywood, Todd"  wrote:
> I have OpenMPI running fine for a small/medium number of tasks (simple hello
> or cpi program). But when I try 700 or 800 tasks, it hangs, apparently on
> startup. I think this might be related to LDAP, since if I try to log into my
> account while the job is hung, I get told my username doesn¹t exist. However,
> I tried adding ­debug to the mpirun, and got the same sequence of output as
> for successful smaller runs, until it hung again. So I added ­-debug-daemons
> and got this (with an exit, i.e. no hanging):
> Š
> [blade1:31733] [0,0,0] wrote setup file
> -

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Heywood, Todd
Hi Ralph,

Thanks for the reply. This is a tough one. It is OpenLDAP. I had thought that I 
might be hitting a file descriptor limit for slapd (LDAP daemon), which ulimit 
-n does not effect (you have to rebuild LDAP with a different FD_SETSIZE 
variable). However, I simply turned on more expressive logging to 
/var/log/slapd, and that resulted in smaller jobs (which successfully ran 
before) hanging. Go figure. It appears that daemons are up and running (from 
ps), and everything hangs in MPI_Init. Ctl-C gives

[blade1:04524] ERROR: A daemon on node blade26 failed to start as expected.
[blade1:04524] ERROR: There may be more information available from
[blade1:04524] ERROR: the remote shell (see above).
[blade1:04524] ERROR: The daemon exited unexpectedly with status 255.

I'm interested in any suggestion, semi-fixes, etc. which might help get to the 
bottom of this. Right now: whether the daemons are indeed up and running, or if 
there are some that are not (causing MPI_Init to hang).

Thanks,

Todd

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph H Castain
Sent: Tuesday, February 06, 2007 8:52 AM
To: Open MPI Users 
Subject: Re: [OMPI users] large jobs hang on startup (deadlock?)

Well, I can't say for sure about LDAP. I did a quick search and found two
things:

1. there are limits imposed in LDAP that may apply to your situation, and

2. that statement varies tremendously depending upon the specific LDAP
implementation you are using

I would suggest you see which LDAP you are using and contact the respective
organization to ask if they do have such a limit, and if so, how to adjust
it.

It sounds like maybe we are hitting the LDAP server with too many requests
too rapidly. Usually, the issue is not starting fast enough, so this is a
new one! We don't currently check to see if everything started up okay, so
that is why the processes might hang - we hope to fix that soon. I'll have
to see if there is something we can do to help alleviate such problems -
might not be in time for the 1.2 release, but perhaps it will make a
subsequent "fix" or, if you are willing/interested, I could provide it to
you as a "patch" you could use until a later official release.

Meantime, you might try upgrading to 1.2b3 or even a nightly release from
the trunk. There are known problems with 1.2b2 (which is why there is a b3
and soon to be an rc1), though I don't think that will be the problem here.
At the least, the nightly trunk has a much better response to ctrl-c in it.

Ralph


On 2/5/07 9:50 AM, "Heywood, Todd"  wrote:

> Hi Ralph,
>  
> Thanks for the reply. The OpenMPI version is 1.2b2 (because I would like to
> integrate it with SGE).
>  
> Here is what is happening:
>  
> (1) When I run with ­debug-daemons (but WITHOUT ­d), I get ³Daemon
> [0,0,27] checking in as pid 7620 on host blade28² (for example) messages for
> most but not all of the daemons that should be started up, and then it hangs.
> I also notice ³reconnecting to LDAP server² messages in various
> /var/log/secure files, and cannot login while things are hung (with ³su:
> pam_ldap: ldap_result Can't contact LDAP server² in /var/log/messages). So
> apparently LDAP hits some limit to opening ssh sessions, and I¹m not sure how
> to address this.
> (2) When I run with ­debug-daemons AND the debug option ­d, all daemons
> start start up and check-in, albeit slowly (debug must slow things down so
> LDAP can handle all the requests??). Then apparently, the cpi process is
> started for each task but it then hangs:
>  
> [blade1:23816] spawn: in job_state_callback(jobid = 1, state = 0x4)
> [blade1:23816] Info: Setting up debugger process table for applications
>  MPIR_being_debugged = 0
>  MPIR_debug_gate = 0
>  MPIR_debug_state = 1
>  MPIR_acquired_pre_main = 0
>  MPIR_i_am_starter = 0
>  MPIR_proctable_size = 800
>  MPIR_proctable:
>(i, host, exe, pid) = (0, blade1, /home4/itstaff/heywood/ompi/cpi, 24193)
> Š
> Š(i, host, exe, pid) = (799, blade213, /home4/itstaff/heywood/ompi/cpi, 4762)
>  
> A ³ps² on the head node shows 200 open ssh sessions, and 4 cpi processes doing
> nothing. A ^C gives this:
>  
> mpirun: killing job...
>  
> --
> WARNING: A process refused to die!
>  
> Host: blade1
> PID:  24193
>  
> This process may still be running and/or consuming resources.
> 
>  
>  
>  
> Still got a ways to go, but any ideas/suggestions are welcome!
>  
> Thanks,
>  
> Todd
>  
>  
> 
> 
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf
> Of Ralph Castain
> Sent: Friday, February 02, 2007 5:20 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] large jobs hang on startup (deadlock?)
>  
> Hi Todd
> 
> To help us provide advice, could you tell us what version of OpenMPI you are
> using?
> 
> Meantime, try adding ³-mca pls_rsh_num_concurrent 200² to your mpirun command
> line. You can 

Re: [OMPI users] MPI_Type_create_subarray fails!

2007-02-06 Thread Avishay Traeger
Surely there is a better way to get this code running without disabling
checks.  Any suggestions?

Thanks,
Avishay

On Mon, 2007-02-05 at 15:36 -0500, Ivan de Jesus Deras Tabora wrote:
> I managed to make it run by disabling the parameter checking.
> I added --mca mpi_param_check 0 to mpirun and it works ok, so maybe
> the problem is with the parameter checking code.
> 
> On 2/2/07, Ivan de Jesus Deras Tabora  wrote:
> > I've been checking the OpenMPI code, trying to find something, but
> > still no luck.  I'll continue checking the code.
> >
> >
> > On 2/2/07, Robert Latham  wrote:
> > > On Tue, Jan 30, 2007 at 04:55:09PM -0500, Ivan de Jesus Deras Tabora 
> > > wrote:
> > > > Then I find all the references to the MPI_Type_create_subarray and
> > > > create a little program just to test that part of the code, the code I
> > > > created is:
> > > ...
> > > > After running this little program using mpirun, it raises the same 
> > > > error.
> > >
> > > This small program runs fine under MPICH2.  Either you have found a
> > > bug in OpenMPI (passing it a datatype it should be able to handle), or
> > > a bug in MPICH2 (passing it a datatype it handled, but should have
> > > complained about).
> > >
> > > ==rob
> > >
> > > --
> > > Rob Latham
> > > Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
> > > Argonne National Lab, IL USA B29D F333 664A 4280 315B
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> >
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Ralph H Castain
It sounds to me like we are probably overwhelming your slapd - your test
would seem to indicate that slowing down the slapd makes us fail even with
smaller jobs, which tends to support that idea.

We frankly haven't encountered that before since our rsh tests have all been
done using non-LDAP authentication (basically, we ask that you setup rsh to
auto-authenticate on each node). It sounds like we need to add an ability to
slow down so that the daemon doesn't "fail" due to authentication timeout
and/or slapd rejection due to the queue being full.

This may take a little time to fix due to other priorities, and will almost
certainly have to be released in a subsequent 1.2.x version. Meantime, I'll
let you know when I get something to test - would you be willing to give it
a shot if I provide a patch? I don't have access to an LDAP-based system.

Ralph


On 2/6/07 7:44 AM, "Heywood, Todd"  wrote:

> Hi Ralph,

Thanks for the reply. This is a tough one. It is OpenLDAP. I had
> thought that I might be hitting a file descriptor limit for slapd (LDAP
> daemon), which ulimit -n does not effect (you have to rebuild LDAP with a
> different FD_SETSIZE variable). However, I simply turned on more expressive
> logging to /var/log/slapd, and that resulted in smaller jobs (which
> successfully ran before) hanging. Go figure. It appears that daemons are up
> and running (from ps), and everything hangs in MPI_Init. Ctl-C
> gives

[blade1:04524] ERROR: A daemon on node blade26 failed to start as
> expected.
[blade1:04524] ERROR: There may be more information available
> from
[blade1:04524] ERROR: the remote shell (see above).
[blade1:04524] ERROR:
> The daemon exited unexpectedly with status 255.

I'm interested in any
> suggestion, semi-fixes, etc. which might help get to the bottom of this. Right
> now: whether the daemons are indeed up and running, or if there are some that
> are not (causing MPI_Init to hang).

Thanks,

Todd

-Original
> Message-
From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain
Sent:
> Tuesday, February 06, 2007 8:52 AM
To: Open MPI Users
> 
Subject: Re: [OMPI users] large jobs hang on startup
> (deadlock?)

Well, I can't say for sure about LDAP. I did a quick search and
> found two
things:

1. there are limits imposed in LDAP that may apply to your
> situation, and

2. that statement varies tremendously depending upon the
> specific LDAP
implementation you are using

I would suggest you see which LDAP
> you are using and contact the respective
organization to ask if they do have
> such a limit, and if so, how to adjust
it.

It sounds like maybe we are
> hitting the LDAP server with too many requests
too rapidly. Usually, the issue
> is not starting fast enough, so this is a
new one! We don't currently check to
> see if everything started up okay, so
that is why the processes might hang -
> we hope to fix that soon. I'll have
to see if there is something we can do to
> help alleviate such problems -
might not be in time for the 1.2 release, but
> perhaps it will make a
subsequent "fix" or, if you are willing/interested, I
> could provide it to
you as a "patch" you could use until a later official
> release.

Meantime, you might try upgrading to 1.2b3 or even a nightly release
> from
the trunk. There are known problems with 1.2b2 (which is why there is a
> b3
and soon to be an rc1), though I don't think that will be the problem
> here.
At the least, the nightly trunk has a much better response to ctrl-c in
> it.

Ralph


On 2/5/07 9:50 AM, "Heywood, Todd"  wrote:

>
> Hi Ralph,
>  
> Thanks for the reply. The OpenMPI version is 1.2b2 (because I
> would like to
> integrate it with SGE).
>  
> Here is what is happening:
>  
>
> (1) When I run with ­debug-daemons (but WITHOUT ­d), I get ³Daemon>
> [0,0,27] checking in as pid 7620 on host blade28² (for example) messages for
>
> most but not all of the daemons that should be started up, and then it
> hangs.
> I also notice ³reconnecting to LDAP server² messages in various
>
> /var/log/secure files, and cannot login while things are hung (with ³su:>
> pam_ldap: ldap_result Can't contact LDAP server² in /var/log/messages). So
>
> apparently LDAP hits some limit to opening ssh sessions, and I¹m not sure
> how
> to address this.
> (2) When I run with ­debug-daemons AND the debug
> option ­d, all daemons
> start start up and check-in, albeit slowly (debug
> must slow things down so
> LDAP can handle all the requests??). Then
> apparently, the cpi process is
> started for each task but it then hangs:
>
> 
> [blade1:23816] spawn: in job_state_callback(jobid = 1, state = 0x4)
>
> [blade1:23816] Info: Setting up debugger process table for applications
>
> MPIR_being_debugged = 0
>  MPIR_debug_gate = 0
>  MPIR_debug_state = 1
>
> MPIR_acquired_pre_main = 0
>  MPIR_i_am_starter = 0
>  MPIR_proctable_size =
> 800
>  MPIR_proctable:
>(i, host, exe, pid) = (0, blade1,
> /home4/itstaff/heywood/ompi/cpi, 

Re: [OMPI users] [OMPI Users] OpenMPI 1.1.4 over ethernet fails

2007-02-06 Thread Jeff Squyres

On Feb 2, 2007, at 11:22 AM, Alex Tumanov wrote:


That really did fix it, George:

# mpirun --prefix $MPIHOME -hostfile ~/testdir/hosts --mca btl
tcp,self --mca btl_tcp_if_exclude ib0,ib1 ~/testdir/hello
Hello from Alex' MPI test program
Process 0 on dr11.lsf.platform.com out of 2
Hello from Alex' MPI test program
Process 1 on compute-0-0.local out of 2

It never occurred to me that the headnode would try to communicate
with the slave using infiniband interfaces... Orthogonally, what are


The problem here is that since your IB IP addresses are  
"public" (meaning that they're not in the IETF defined ranges for  
private IP addresses), Open MPI assumes that they can be used to  
communicate with your back-end nodes on the IPoIB network.  See this  
FAQ entry for details:


http://www.open-mpi.org/faq/?category=tcp#tcp-routability

If you update your IP addresses to be in the private range, Open MPI  
should do the Right routability computations and you shouldn't need  
to exclude anything.



the industry standard OpenMPI benchmark tests I could run to perform a
real test?


Just about anything will work -- NetPIPE, the Intel Benchmarks, ...etc.

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



Re: [OMPI users] MPI_Type_create_subarray fails!

2007-02-06 Thread George Bosilca
A correction has been made to the MPI_Type_create_subarray. The  
particular test that was failing for you has been replaced with a  
better one. You can grab it either from the nightly build or in few  
days from the next (1.2) release.


  Thanks,
george.

On Feb 6, 2007, at 9:52 AM, Avishay Traeger wrote:

Surely there is a better way to get this code running without  
disabling

checks.  Any suggestions?

Thanks,
Avishay

On Mon, 2007-02-05 at 15:36 -0500, Ivan de Jesus Deras Tabora wrote:

I managed to make it run by disabling the parameter checking.
I added --mca mpi_param_check 0 to mpirun and it works ok, so maybe
the problem is with the parameter checking code.

On 2/2/07, Ivan de Jesus Deras Tabora  wrote:

I've been checking the OpenMPI code, trying to find something, but
still no luck.  I'll continue checking the code.


On 2/2/07, Robert Latham  wrote:
On Tue, Jan 30, 2007 at 04:55:09PM -0500, Ivan de Jesus Deras  
Tabora wrote:

Then I find all the references to the MPI_Type_create_subarray and
create a little program just to test that part of the code, the  
code I

created is:

...
After running this little program using mpirun, it raises the  
same error.


This small program runs fine under MPICH2.  Either you have found a
bug in OpenMPI (passing it a datatype it should be able to  
handle), or

a bug in MPICH2 (passing it a datatype it handled, but should have
complained about).

==rob

--
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059  
8CDF
Argonne National Lab, IL USA B29D F333 664A 4280  
315B

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


"Half of what I say is meaningless; but I say it so that the other  
half may reach you"

  Kahlil Gibran




Re: [OMPI users] running OpenMPI jobs over Myrinet gm interconnect

2007-02-06 Thread Alex Tumanov

Thank you for your reply, Reese!


What version of GM are you running?

# rpm -qa |egrep "^gm-[0-9]+|^gm-devel"
gm-2.0.24-1
gm-devel-2.0.24-1
Is this too old?


And are you sure that gm_board_info
shows all the nodes that are listed in your machine file?

Yes, that was the issue - bad cable connection to my compute node
prevented it from being seen on the fabric :( Thanks for pointing this
out for me.


 Could you send
a copy of your gm_board_info output , please?

Sure:
# ./gm_board_info
GM build ID is "2.0.24_Linux_rc20051223164441PST
@dr11.myco.com:/usr/src/redhat/BUILD/gm-2.0.24_Linux Tue Jan 30
23:07:45 EST 2007."


Board number 0:
 lanai_cpu_version = 0x0a00 (LANai10.0)
 lanai_sram_size   = 0x001fe000 (2040K bytes)
ROM settings:
 MAC=00:60:dd:49:1e:bf
 SN=187449
 PC=M3F-PCIXD-2
 PN=09-02666
LANai time is 0x209b211b12 ticks, or about 1043 minutes since reset.
Mapper is 00:60:dd:49:99:96.
Map version is 1965903.
2 hosts.
Network is fully configured.
This node is "dr11.myco.com"
Board has room for 16 ports,  1559 nodes/routes,  16384 cache entries
 Port token cnt: send=61, recv=253
Port: Status  PID
  0:   BUSY  7489  (this process [gm_board_info])
  1:   BUSY 25113
Route table for this node follows:
gmID MAC Address gmName Route
 -  -
  1 00:60:dd:49:1e:bfdr11.myco.com (this node)
  2 00:60:dd:49:99:96dr05.myco.com 81 (mapper)


 A mismatch between the list
of nodes actually configured onto the Myrinet fabric and the machine file is
a common source of errors like this.  The mismatch could be caused by cable
failure or other mapping issues.

Could you elaborate on the mapping issues you mentioned? What are they?


Why GM instead of MX, by the way?

We have a few MX cards in-house, but no MX switch due to its current
market price. So we're only able to perform MX testing using
direct-connection cables, which is not very exciting :) On the
contrary, we've already had GM boards and a switch and found it
sufficient for OpenMPI testing purposes. Would be great to upgrade to
MX in the near future.

Thank you very much for your help.

Sincerely,
Alex.


[OMPI users] Problems with MPI_Init

2007-02-06 Thread Pablo Hernán Rodríguez Zivic

Hello everyone,

I'm using MPI (ParMetis) on an 64 bits machine. When I tried to test it  
using the example programs it hangs with an error message wich says that I  
must change the device to ch_p4mpd. So, once I change it on the file  
mpirun.ch4 the application starts and hangs (never returns) in the  
MPI_Init instruction. Using gdb to debug it I found the function that  
hangs: BNR_Fence. The call stack is the following:


0x004af87e in BNR_Fence ()
(gdb) up
#1  0x004a56e2 in bm_start ()
(gdb) up
#2  0x004a4053 in p4_initenv ()
(gdb) up
#3  0x004b3bc4 in MPID_P4_Init ()
(gdb) up
#4  0x004b3806 in MPID_CH_InitMsgPass ()
(gdb) up
#5  0x004b1125 in MPID_Init ()
(gdb) up
#6  0x0048606d in MPIR_Init ()
(gdb) up
#7  0x00485e6d in PMPI_Init ()
(gdb) up
#8  0x00406066 in main ()



Does anyone have a clue of what's going on?

Anyway, thank you all.

Pablo





__ 
Preguntá. Respondé. Descubrí. 
Todo lo que querías saber, y lo que ni imaginabas, 
está en Yahoo! Respuestas (Beta). 
¡Probalo ya! 
http://www.yahoo.com.ar/respuestas 



Re: [OMPI users] Problems with MPI_Init

2007-02-06 Thread Jeff Squyres

Greetings Pablo.

Please note that this list is for support of the Open MPI software  
package.


From the output you included, it looks like you are not using Open  
MPI, but are rather using one of the MPICH variants (i.e., a  
different software package).  You might want to send your question to  
the MPICH mailing list -- we won't really be able to help you here.


Good luck!


On Feb 6, 2007, at 11:55 AM, Pablo Hernán Rodríguez Zivic wrote:


Hello everyone,

I'm using MPI (ParMetis) on an 64 bits machine. When I tried to  
test it using the example programs it hangs with an error message  
wich says that I must change the device to ch_p4mpd. So, once I  
change it on the file mpirun.ch4 the application starts and hangs  
(never returns) in the MPI_Init instruction. Using gdb to debug it  
I found the function that hangs: BNR_Fence. The call stack is the  
following:


0x004af87e in BNR_Fence ()
(gdb) up
#1  0x004a56e2 in bm_start ()
(gdb) up
#2  0x004a4053 in p4_initenv ()
(gdb) up
#3  0x004b3bc4 in MPID_P4_Init ()
(gdb) up
#4  0x004b3806 in MPID_CH_InitMsgPass ()
(gdb) up
#5  0x004b1125 in MPID_Init ()
(gdb) up
#6  0x0048606d in MPIR_Init ()
(gdb) up
#7  0x00485e6d in PMPI_Init ()
(gdb) up
#8  0x00406066 in main ()



Does anyone have a clue of what's going on?

Anyway, thank you all.

Pablo





__Preguntá.  
Respondé. Descubrí.Todo lo que querías saber, y lo que ni  
imaginabas,está en Yahoo! Respuestas (Beta).¡Probalo ya!http:// 
www.yahoo.com.ar/respuestas

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems




Re: [OMPI users] running OpenMPI jobs over Myrinet gm interconnect

2007-02-06 Thread Reese Faucette

What version of GM are you running?

# rpm -qa |egrep "^gm-[0-9]+|^gm-devel"
gm-2.0.24-1
gm-devel-2.0.24-1
Is this too old?


Nope, that's just fine.


 A mismatch between the list
of nodes actually configured onto the Myrinet fabric and the machine file 
is
a common source of errors like this.  The mismatch could be caused by 
cable

failure or other mapping issues.

Could you elaborate on the mapping issues you mentioned? What are they?


If you have 3 nodes, A,B,C and the mapper on node C dies for some reason 
(very unusual, but maybe killed by mistake, say), then node B gets rebooted, 
then when node B comes back up, it will have routes to only node A and 
itself, though A and C will still have routes everywhere.  The map versions 
on A and B will match, but C will have an old map version.  Thus, an MPI job 
spanning A,B,C would fail, even though all 3 nodes show up in gm_board_info 
from node A.



Why GM instead of MX, by the way?

We have a few MX cards in-house, but no MX switch due to its current
market price. So we're only able to perform MX testing using
direct-connection cables, which is not very exciting :) On the
contrary, we've already had GM boards and a switch and found it
sufficient for OpenMPI testing purposes. Would be great to upgrade to
MX in the near future.


MX is just a different software stack, the hardware is the same.  MX works 
with both 2G and 10G, but GM does not work with the 10G cards.  I see from 
your gm_board_info output that you are using D-cards, which MX supports 
(anything D or later is supported by MX, but not B or C cards).  Switches 
don't care about MX vs. GM.  MX will give better performance for most MPI 
applications than GM, and hardware too old for MX is fairly uncommon.


-reese




Re: [OMPI users] [OMPI Users] OpenMPI 1.1.4 over ethernet fails

2007-02-06 Thread Alex Tumanov

Thanks for your reply, Jeff.


> It never occurred to me that the headnode would try to communicate
> with the slave using infiniband interfaces... Orthogonally, what are

The problem here is that since your IB IP addresses are
"public" (meaning that they're not in the IETF defined ranges for
private IP addresses), Open MPI assumes that they can be used to
communicate with your back-end nodes on the IPoIB network.  See this
FAQ entry for details:
http://www.open-mpi.org/faq/?category=tcp#tcp-routability


The pointer was rather informative. We do have to use non-standard
ranges for IB interfaces, because we're performing automatic IP over
IB configuration based on the eth0 IP and netmask. Given 10.x.y.z/8
configuration for eth0, the IPs assigned to infiniband interfaces will
not only end up on the same subnet ID, but may even conflict with
existing ethernet interface IP addresses. Hence the use of 20.x.y.z
and 30.x.y.z for ib0 & ib1 respectively.


> the industry standard OpenMPI benchmark tests I could run to perform a
> real test?

Just about anything will work -- NetPIPE, the Intel Benchmarks, ...etc.

I actually tried benchmarking with HPLinpack. Specifically, I'm
interested in measuring performance improvements when running OpenMPI
jobs over several available interconnects. However, I have difficulty
interpreting the cryptic HPL output. I've seen members of the list
using xhpl benchmark. Perhaps someone could shed some light on how to
read its output? Also, my understanding is that the only advantage of
multiple interconnect availability is the increased bandwidth for
OpenMPI message striping - correct? In that case, I would probably
benefit from a more bandwidth intensive benchmark. If the OpenMPI
community could point me in the right direction for that, it would be
greatly appreciated. I have a feeling that this is not one of HPL's
strongest points.

Thanks again for your willingness to help and share your expertise.

Sincerely,
Alex.


Re: [OMPI users] [OMPI Users] OpenMPI 1.1.4 over ethernet fails

2007-02-06 Thread Jeff Squyres

On Feb 6, 2007, at 12:38 PM, Alex Tumanov wrote:


http://www.open-mpi.org/faq/?category=tcp#tcp-routability


The pointer was rather informative. We do have to use non-standard
ranges for IB interfaces, because we're performing automatic IP over
IB configuration based on the eth0 IP and netmask. Given 10.x.y.z/8
configuration for eth0, the IPs assigned to infiniband interfaces will
not only end up on the same subnet ID, but may even conflict with
existing ethernet interface IP addresses. Hence the use of 20.x.y.z
and 30.x.y.z for ib0 & ib1 respectively.


I'm not sure I'm parsing your explanation properly.  Are you saying  
that your cluster's ethernet addresses are dispersed across all of  
10.x.y.z, and therefore you don't want the IPoIB addresses to  
conflict?  Even being conservative, that's 250^3 IP addresses (over  
15 million).  There should be plenty of space in there for your  
cluster's ethernet and IPoIB addresses to share (and any other  
machines that also share your 10.x.y.z address space).


But it doesn't really matter -- this is a minor point.  :-)


I actually tried benchmarking with HPLinpack. Specifically, I'm
interested in measuring performance improvements when running OpenMPI
jobs over several available interconnects. However, I have difficulty
interpreting the cryptic HPL output. I've seen members of the list
using xhpl benchmark. Perhaps someone could shed some light on how to
read its output? Also, my understanding is that the only advantage of


I'll defer to others on this one...


multiple interconnect availability is the increased bandwidth for
OpenMPI message striping - correct? In that case, I would probably


That's a big reason, yes.


benefit from a more bandwidth intensive benchmark. If the OpenMPI
community could point me in the right direction for that, it would be
greatly appreciated. I have a feeling that this is not one of HPL's
strongest points.


Actually, it depends on how big your HPL problem size it.  HPL can  
send very large messages if you set the size high enough.  For  
example, when we were running HPL at Sandia for its Top500 run, we  
were seeing 800MB messages (for 4000+ nodes, lotsa memory -- very  
large HPL problem size).


A simple ping-pong benchmark can also be useful to ballpark what  
you're seeing for your network performance.  My personal favorite is  
NetPIPE, but there's others as well.


--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



Re: [OMPI users] MPI_Type_create_subarray fails!

2007-02-06 Thread Jeff Squyres
FWIW, this has been committed on the 1.1 branch.  So if we ever do a  
1.1.5 release, it will be included.



On Feb 6, 2007, at 10:46 AM, George Bosilca wrote:


A correction has been made to the MPI_Type_create_subarray. The
particular test that was failing for you has been replaced with a
better one. You can grab it either from the nightly build or in few
days from the next (1.2) release.

   Thanks,
 george.

On Feb 6, 2007, at 9:52 AM, Avishay Traeger wrote:


Surely there is a better way to get this code running without
disabling
checks.  Any suggestions?

Thanks,
Avishay

On Mon, 2007-02-05 at 15:36 -0500, Ivan de Jesus Deras Tabora wrote:

I managed to make it run by disabling the parameter checking.
I added --mca mpi_param_check 0 to mpirun and it works ok, so maybe
the problem is with the parameter checking code.

On 2/2/07, Ivan de Jesus Deras Tabora  wrote:

I've been checking the OpenMPI code, trying to find something, but
still no luck.  I'll continue checking the code.


On 2/2/07, Robert Latham  wrote:

On Tue, Jan 30, 2007 at 04:55:09PM -0500, Ivan de Jesus Deras
Tabora wrote:
Then I find all the references to the MPI_Type_create_subarray  
and

create a little program just to test that part of the code, the
code I
created is:

...

After running this little program using mpirun, it raises the
same error.


This small program runs fine under MPICH2.  Either you have  
found a

bug in OpenMPI (passing it a datatype it should be able to
handle), or
a bug in MPICH2 (passing it a datatype it handled, but should have
complained about).

==rob

--
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059
8CDF
Argonne National Lab, IL USA B29D F333 664A 4280
315B
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


"Half of what I say is meaningless; but I say it so that the other
half may reach you"
   Kahlil Gibran


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Heywood, Todd
Hi Ralph,

It looks that way. I created a user local to each node, with local 
authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up just fine 
for that.

I know this is an OpenMPI list, but does anyone know how common or uncommon 
LDAP-based clusters are? I would have thought this issue would have arisen 
elsewhere, but Googling MPI+LDAP (and similar) doesn't turn up much.

I'd certainly be willing to test any patch. Thanks.

Todd

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph H Castain
Sent: Tuesday, February 06, 2007 9:54 AM
To: Open MPI Users 
Subject: Re: [OMPI users] large jobs hang on startup (deadlock?)

It sounds to me like we are probably overwhelming your slapd - your test
would seem to indicate that slowing down the slapd makes us fail even with
smaller jobs, which tends to support that idea.

We frankly haven't encountered that before since our rsh tests have all been
done using non-LDAP authentication (basically, we ask that you setup rsh to
auto-authenticate on each node). It sounds like we need to add an ability to
slow down so that the daemon doesn't "fail" due to authentication timeout
and/or slapd rejection due to the queue being full.

This may take a little time to fix due to other priorities, and will almost
certainly have to be released in a subsequent 1.2.x version. Meantime, I'll
let you know when I get something to test - would you be willing to give it
a shot if I provide a patch? I don't have access to an LDAP-based system.

Ralph


On 2/6/07 7:44 AM, "Heywood, Todd"  wrote:

> Hi Ralph,

Thanks for the reply. This is a tough one. It is OpenLDAP. I had
> thought that I might be hitting a file descriptor limit for slapd (LDAP
> daemon), which ulimit -n does not effect (you have to rebuild LDAP with a
> different FD_SETSIZE variable). However, I simply turned on more expressive
> logging to /var/log/slapd, and that resulted in smaller jobs (which
> successfully ran before) hanging. Go figure. It appears that daemons are up
> and running (from ps), and everything hangs in MPI_Init. Ctl-C
> gives

[blade1:04524] ERROR: A daemon on node blade26 failed to start as
> expected.
[blade1:04524] ERROR: There may be more information available
> from
[blade1:04524] ERROR: the remote shell (see above).
[blade1:04524] ERROR:
> The daemon exited unexpectedly with status 255.

I'm interested in any
> suggestion, semi-fixes, etc. which might help get to the bottom of this. Right
> now: whether the daemons are indeed up and running, or if there are some that
> are not (causing MPI_Init to hang).

Thanks,

Todd

-Original
> Message-
From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain
Sent:
> Tuesday, February 06, 2007 8:52 AM
To: Open MPI Users
> 
Subject: Re: [OMPI users] large jobs hang on startup
> (deadlock?)

Well, I can't say for sure about LDAP. I did a quick search and
> found two
things:

1. there are limits imposed in LDAP that may apply to your
> situation, and

2. that statement varies tremendously depending upon the
> specific LDAP
implementation you are using

I would suggest you see which LDAP
> you are using and contact the respective
organization to ask if they do have
> such a limit, and if so, how to adjust
it.

It sounds like maybe we are
> hitting the LDAP server with too many requests
too rapidly. Usually, the issue
> is not starting fast enough, so this is a
new one! We don't currently check to
> see if everything started up okay, so
that is why the processes might hang -
> we hope to fix that soon. I'll have
to see if there is something we can do to
> help alleviate such problems -
might not be in time for the 1.2 release, but
> perhaps it will make a
subsequent "fix" or, if you are willing/interested, I
> could provide it to
you as a "patch" you could use until a later official
> release.

Meantime, you might try upgrading to 1.2b3 or even a nightly release
> from
the trunk. There are known problems with 1.2b2 (which is why there is a
> b3
and soon to be an rc1), though I don't think that will be the problem
> here.
At the least, the nightly trunk has a much better response to ctrl-c in
> it.

Ralph


On 2/5/07 9:50 AM, "Heywood, Todd"  wrote:

>
> Hi Ralph,
>  
> Thanks for the reply. The OpenMPI version is 1.2b2 (because I
> would like to
> integrate it with SGE).
>  
> Here is what is happening:
>  
>
> (1) When I run with ­debug-daemons (but WITHOUT ­d), I get ³Daemon>
> [0,0,27] checking in as pid 7620 on host blade28² (for example) messages for
>
> most but not all of the daemons that should be started up, and then it
> hangs.
> I also notice ³reconnecting to LDAP server² messages in various
>
> /var/log/secure files, and cannot login while things are hung (with ³su:>
> pam_ldap: ldap_result Can't contact LDAP server² in /var/log/messages). So
>
> apparently LDAP hits some limit to opening ssh sessions, and I¹m not sure
> ho

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-06 Thread Ralph Castain
Hi Todd

Just as a thought - you could try not using --debug-daemons or -d and
instead setting "-mca pls_rsh_num_concurrent 50" or some such small number.
This will tell the system to launch 50 ssh calls at a time, waiting for each
group to complete before launching the next. You can't use it with
--debug-daemons as that option prevents the ssh calls from "closing" so that
you can get the output from the daemons. You can still launch as big a job
as you like - we'll just do it 50 ssh calls at a time.

If we are truly overwhelming the slapd, then this should alleviate the
problem.

Let me know if you get to try it...
Ralph


On 2/6/07 4:05 PM, "Heywood, Todd"  wrote:

> Hi Ralph,

It looks that way. I created a user local to each node, with local
> authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up just
> fine for that.

I know this is an OpenMPI list, but does anyone know how
> common or uncommon LDAP-based clusters are? I would have thought this issue
> would have arisen elsewhere, but Googling MPI+LDAP (and similar) doesn't turn
> up much.

I'd certainly be willing to test any patch.
> Thanks.

Todd

-Original Message-
From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain
Sent:
> Tuesday, February 06, 2007 9:54 AM
To: Open MPI Users
> 
Subject: Re: [OMPI users] large jobs hang on startup
> (deadlock?)

It sounds to me like we are probably overwhelming your slapd -
> your test
would seem to indicate that slowing down the slapd makes us fail
> even with
smaller jobs, which tends to support that idea.

We frankly haven't
> encountered that before since our rsh tests have all been
done using non-LDAP
> authentication (basically, we ask that you setup rsh to
auto-authenticate on
> each node). It sounds like we need to add an ability to
slow down so that the
> daemon doesn't "fail" due to authentication timeout
and/or slapd rejection due
> to the queue being full.

This may take a little time to fix due to other
> priorities, and will almost
certainly have to be released in a subsequent
> 1.2.x version. Meantime, I'll
let you know when I get something to test -
> would you be willing to give it
a shot if I provide a patch? I don't have
> access to an LDAP-based system.

Ralph


On 2/6/07 7:44 AM, "Heywood, Todd"
>  wrote:

> Hi Ralph,

Thanks for the reply. This is a tough
> one. It is OpenLDAP. I had
> thought that I might be hitting a file descriptor
> limit for slapd (LDAP
> daemon), which ulimit -n does not effect (you have to
> rebuild LDAP with a
> different FD_SETSIZE variable). However, I simply turned
> on more expressive
> logging to /var/log/slapd, and that resulted in smaller
> jobs (which
> successfully ran before) hanging. Go figure. It appears that
> daemons are up
> and running (from ps), and everything hangs in MPI_Init.
> Ctl-C
> gives

[blade1:04524] ERROR: A daemon on node blade26 failed to start
> as
> expected.
[blade1:04524] ERROR: There may be more information available
>
> from
[blade1:04524] ERROR: the remote shell (see above).
[blade1:04524]
> ERROR:
> The daemon exited unexpectedly with status 255.

I'm interested in
> any
> suggestion, semi-fixes, etc. which might help get to the bottom of this.
> Right
> now: whether the daemons are indeed up and running, or if there are
> some that
> are not (causing MPI_Init to
> hang).

Thanks,

Todd

-Original
> Message-
From:
> users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of
> Ralph H Castain
Sent:
> Tuesday, February 06, 2007 8:52 AM
To: Open MPI
> Users
> 
Subject: Re: [OMPI users] large jobs hang on
> startup
> (deadlock?)

Well, I can't say for sure about LDAP. I did a quick
> search and
> found two
things:

1. there are limits imposed in LDAP that may
> apply to your
> situation, and

2. that statement varies tremendously
> depending upon the
> specific LDAP
implementation you are using

I would
> suggest you see which LDAP
> you are using and contact the
> respective
organization to ask if they do have
> such a limit, and if so, how
> to adjust
it.

It sounds like maybe we are
> hitting the LDAP server with too
> many requests
too rapidly. Usually, the issue
> is not starting fast enough,
> so this is a
new one! We don't currently check to
> see if everything started
> up okay, so
that is why the processes might hang -
> we hope to fix that soon.
> I'll have
to see if there is something we can do to
> help alleviate such
> problems -
might not be in time for the 1.2 release, but
> perhaps it will
> make a
subsequent "fix" or, if you are willing/interested, I
> could provide
> it to
you as a "patch" you could use until a later official
>
> release.

Meantime, you might try upgrading to 1.2b3 or even a nightly
> release
> from
the trunk. There are known problems with 1.2b2 (which is why
> there is a
> b3
and soon to be an rc1), though I don't think that will be the
> problem
> here.
At the least, the nightly trunk has a much bett