Re: [OMPI users] large jobs hang on startup (deadlock?)
Well, I can't say for sure about LDAP. I did a quick search and found two things: 1. there are limits imposed in LDAP that may apply to your situation, and 2. that statement varies tremendously depending upon the specific LDAP implementation you are using I would suggest you see which LDAP you are using and contact the respective organization to ask if they do have such a limit, and if so, how to adjust it. It sounds like maybe we are hitting the LDAP server with too many requests too rapidly. Usually, the issue is not starting fast enough, so this is a new one! We don't currently check to see if everything started up okay, so that is why the processes might hang - we hope to fix that soon. I'll have to see if there is something we can do to help alleviate such problems - might not be in time for the 1.2 release, but perhaps it will make a subsequent "fix" or, if you are willing/interested, I could provide it to you as a "patch" you could use until a later official release. Meantime, you might try upgrading to 1.2b3 or even a nightly release from the trunk. There are known problems with 1.2b2 (which is why there is a b3 and soon to be an rc1), though I don't think that will be the problem here. At the least, the nightly trunk has a much better response to ctrl-c in it. Ralph On 2/5/07 9:50 AM, "Heywood, Todd" wrote: > Hi Ralph, > > Thanks for the reply. The OpenMPI version is 1.2b2 (because I would like to > integrate it with SGE). > > Here is what is happening: > > (1) When I run with debug-daemons (but WITHOUT d), I get ³Daemon > [0,0,27] checking in as pid 7620 on host blade28² (for example) messages for > most but not all of the daemons that should be started up, and then it hangs. > I also notice ³reconnecting to LDAP server² messages in various > /var/log/secure files, and cannot login while things are hung (with ³su: > pam_ldap: ldap_result Can't contact LDAP server² in /var/log/messages). So > apparently LDAP hits some limit to opening ssh sessions, and I¹m not sure how > to address this. > (2) When I run with debug-daemons AND the debug option d, all daemons > start start up and check-in, albeit slowly (debug must slow things down so > LDAP can handle all the requests??). Then apparently, the cpi process is > started for each task but it then hangs: > > [blade1:23816] spawn: in job_state_callback(jobid = 1, state = 0x4) > [blade1:23816] Info: Setting up debugger process table for applications > MPIR_being_debugged = 0 > MPIR_debug_gate = 0 > MPIR_debug_state = 1 > MPIR_acquired_pre_main = 0 > MPIR_i_am_starter = 0 > MPIR_proctable_size = 800 > MPIR_proctable: >(i, host, exe, pid) = (0, blade1, /home4/itstaff/heywood/ompi/cpi, 24193) > > (i, host, exe, pid) = (799, blade213, /home4/itstaff/heywood/ompi/cpi, 4762) > > A ³ps² on the head node shows 200 open ssh sessions, and 4 cpi processes doing > nothing. A ^C gives this: > > mpirun: killing job... > > -- > WARNING: A process refused to die! > > Host: blade1 > PID: 24193 > > This process may still be running and/or consuming resources. > > > > > Still got a ways to go, but any ideas/suggestions are welcome! > > Thanks, > > Todd > > > > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf > Of Ralph Castain > Sent: Friday, February 02, 2007 5:20 PM > To: Open MPI Users > Subject: Re: [OMPI users] large jobs hang on startup (deadlock?) > > Hi Todd > > To help us provide advice, could you tell us what version of OpenMPI you are > using? > > Meantime, try adding ³-mca pls_rsh_num_concurrent 200² to your mpirun command > line. You can up the number of concurrent daemons we launch to anything your > system will support basically, we limit the number only because some systems > have limits on the number of ssh calls we can have active at any one time. > Because we hold stdio open when running with debug-daemons, the number of > concurrent daemons must match or exceed the number of nodes you are trying to > launch on. > > I have a ³fix² in the works that will help relieve some of that restriction, > but that won¹t come out until a later release. > > Hopefully, that will allow you to obtain more debug info about why/where > things are hanging. > > Ralph > > > On 2/2/07 11:41 AM, "Heywood, Todd" wrote: > I have OpenMPI running fine for a small/medium number of tasks (simple hello > or cpi program). But when I try 700 or 800 tasks, it hangs, apparently on > startup. I think this might be related to LDAP, since if I try to log into my > account while the job is hung, I get told my username doesn¹t exist. However, > I tried adding debug to the mpirun, and got the same sequence of output as > for successful smaller runs, until it hung again. So I added -debug-daemons > and got this (with an exit, i.e. no hanging): > > [blade1:31733] [0,0,0] wrote setup file > -
Re: [OMPI users] large jobs hang on startup (deadlock?)
Hi Ralph, Thanks for the reply. This is a tough one. It is OpenLDAP. I had thought that I might be hitting a file descriptor limit for slapd (LDAP daemon), which ulimit -n does not effect (you have to rebuild LDAP with a different FD_SETSIZE variable). However, I simply turned on more expressive logging to /var/log/slapd, and that resulted in smaller jobs (which successfully ran before) hanging. Go figure. It appears that daemons are up and running (from ps), and everything hangs in MPI_Init. Ctl-C gives [blade1:04524] ERROR: A daemon on node blade26 failed to start as expected. [blade1:04524] ERROR: There may be more information available from [blade1:04524] ERROR: the remote shell (see above). [blade1:04524] ERROR: The daemon exited unexpectedly with status 255. I'm interested in any suggestion, semi-fixes, etc. which might help get to the bottom of this. Right now: whether the daemons are indeed up and running, or if there are some that are not (causing MPI_Init to hang). Thanks, Todd -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain Sent: Tuesday, February 06, 2007 8:52 AM To: Open MPI Users Subject: Re: [OMPI users] large jobs hang on startup (deadlock?) Well, I can't say for sure about LDAP. I did a quick search and found two things: 1. there are limits imposed in LDAP that may apply to your situation, and 2. that statement varies tremendously depending upon the specific LDAP implementation you are using I would suggest you see which LDAP you are using and contact the respective organization to ask if they do have such a limit, and if so, how to adjust it. It sounds like maybe we are hitting the LDAP server with too many requests too rapidly. Usually, the issue is not starting fast enough, so this is a new one! We don't currently check to see if everything started up okay, so that is why the processes might hang - we hope to fix that soon. I'll have to see if there is something we can do to help alleviate such problems - might not be in time for the 1.2 release, but perhaps it will make a subsequent "fix" or, if you are willing/interested, I could provide it to you as a "patch" you could use until a later official release. Meantime, you might try upgrading to 1.2b3 or even a nightly release from the trunk. There are known problems with 1.2b2 (which is why there is a b3 and soon to be an rc1), though I don't think that will be the problem here. At the least, the nightly trunk has a much better response to ctrl-c in it. Ralph On 2/5/07 9:50 AM, "Heywood, Todd" wrote: > Hi Ralph, > > Thanks for the reply. The OpenMPI version is 1.2b2 (because I would like to > integrate it with SGE). > > Here is what is happening: > > (1) When I run with debug-daemons (but WITHOUT d), I get ³Daemon > [0,0,27] checking in as pid 7620 on host blade28² (for example) messages for > most but not all of the daemons that should be started up, and then it hangs. > I also notice ³reconnecting to LDAP server² messages in various > /var/log/secure files, and cannot login while things are hung (with ³su: > pam_ldap: ldap_result Can't contact LDAP server² in /var/log/messages). So > apparently LDAP hits some limit to opening ssh sessions, and I¹m not sure how > to address this. > (2) When I run with debug-daemons AND the debug option d, all daemons > start start up and check-in, albeit slowly (debug must slow things down so > LDAP can handle all the requests??). Then apparently, the cpi process is > started for each task but it then hangs: > > [blade1:23816] spawn: in job_state_callback(jobid = 1, state = 0x4) > [blade1:23816] Info: Setting up debugger process table for applications > MPIR_being_debugged = 0 > MPIR_debug_gate = 0 > MPIR_debug_state = 1 > MPIR_acquired_pre_main = 0 > MPIR_i_am_starter = 0 > MPIR_proctable_size = 800 > MPIR_proctable: >(i, host, exe, pid) = (0, blade1, /home4/itstaff/heywood/ompi/cpi, 24193) > Š > Š(i, host, exe, pid) = (799, blade213, /home4/itstaff/heywood/ompi/cpi, 4762) > > A ³ps² on the head node shows 200 open ssh sessions, and 4 cpi processes doing > nothing. A ^C gives this: > > mpirun: killing job... > > -- > WARNING: A process refused to die! > > Host: blade1 > PID: 24193 > > This process may still be running and/or consuming resources. > > > > > Still got a ways to go, but any ideas/suggestions are welcome! > > Thanks, > > Todd > > > > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf > Of Ralph Castain > Sent: Friday, February 02, 2007 5:20 PM > To: Open MPI Users > Subject: Re: [OMPI users] large jobs hang on startup (deadlock?) > > Hi Todd > > To help us provide advice, could you tell us what version of OpenMPI you are > using? > > Meantime, try adding ³-mca pls_rsh_num_concurrent 200² to your mpirun command > line. You can
Re: [OMPI users] MPI_Type_create_subarray fails!
Surely there is a better way to get this code running without disabling checks. Any suggestions? Thanks, Avishay On Mon, 2007-02-05 at 15:36 -0500, Ivan de Jesus Deras Tabora wrote: > I managed to make it run by disabling the parameter checking. > I added --mca mpi_param_check 0 to mpirun and it works ok, so maybe > the problem is with the parameter checking code. > > On 2/2/07, Ivan de Jesus Deras Tabora wrote: > > I've been checking the OpenMPI code, trying to find something, but > > still no luck. I'll continue checking the code. > > > > > > On 2/2/07, Robert Latham wrote: > > > On Tue, Jan 30, 2007 at 04:55:09PM -0500, Ivan de Jesus Deras Tabora > > > wrote: > > > > Then I find all the references to the MPI_Type_create_subarray and > > > > create a little program just to test that part of the code, the code I > > > > created is: > > > ... > > > > After running this little program using mpirun, it raises the same > > > > error. > > > > > > This small program runs fine under MPICH2. Either you have found a > > > bug in OpenMPI (passing it a datatype it should be able to handle), or > > > a bug in MPICH2 (passing it a datatype it handled, but should have > > > complained about). > > > > > > ==rob > > > > > > -- > > > Rob Latham > > > Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF > > > Argonne National Lab, IL USA B29D F333 664A 4280 315B > > > ___ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] large jobs hang on startup (deadlock?)
It sounds to me like we are probably overwhelming your slapd - your test would seem to indicate that slowing down the slapd makes us fail even with smaller jobs, which tends to support that idea. We frankly haven't encountered that before since our rsh tests have all been done using non-LDAP authentication (basically, we ask that you setup rsh to auto-authenticate on each node). It sounds like we need to add an ability to slow down so that the daemon doesn't "fail" due to authentication timeout and/or slapd rejection due to the queue being full. This may take a little time to fix due to other priorities, and will almost certainly have to be released in a subsequent 1.2.x version. Meantime, I'll let you know when I get something to test - would you be willing to give it a shot if I provide a patch? I don't have access to an LDAP-based system. Ralph On 2/6/07 7:44 AM, "Heywood, Todd" wrote: > Hi Ralph, Thanks for the reply. This is a tough one. It is OpenLDAP. I had > thought that I might be hitting a file descriptor limit for slapd (LDAP > daemon), which ulimit -n does not effect (you have to rebuild LDAP with a > different FD_SETSIZE variable). However, I simply turned on more expressive > logging to /var/log/slapd, and that resulted in smaller jobs (which > successfully ran before) hanging. Go figure. It appears that daemons are up > and running (from ps), and everything hangs in MPI_Init. Ctl-C > gives [blade1:04524] ERROR: A daemon on node blade26 failed to start as > expected. [blade1:04524] ERROR: There may be more information available > from [blade1:04524] ERROR: the remote shell (see above). [blade1:04524] ERROR: > The daemon exited unexpectedly with status 255. I'm interested in any > suggestion, semi-fixes, etc. which might help get to the bottom of this. Right > now: whether the daemons are indeed up and running, or if there are some that > are not (causing MPI_Init to hang). Thanks, Todd -Original > Message- From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain Sent: > Tuesday, February 06, 2007 8:52 AM To: Open MPI Users > Subject: Re: [OMPI users] large jobs hang on startup > (deadlock?) Well, I can't say for sure about LDAP. I did a quick search and > found two things: 1. there are limits imposed in LDAP that may apply to your > situation, and 2. that statement varies tremendously depending upon the > specific LDAP implementation you are using I would suggest you see which LDAP > you are using and contact the respective organization to ask if they do have > such a limit, and if so, how to adjust it. It sounds like maybe we are > hitting the LDAP server with too many requests too rapidly. Usually, the issue > is not starting fast enough, so this is a new one! We don't currently check to > see if everything started up okay, so that is why the processes might hang - > we hope to fix that soon. I'll have to see if there is something we can do to > help alleviate such problems - might not be in time for the 1.2 release, but > perhaps it will make a subsequent "fix" or, if you are willing/interested, I > could provide it to you as a "patch" you could use until a later official > release. Meantime, you might try upgrading to 1.2b3 or even a nightly release > from the trunk. There are known problems with 1.2b2 (which is why there is a > b3 and soon to be an rc1), though I don't think that will be the problem > here. At the least, the nightly trunk has a much better response to ctrl-c in > it. Ralph On 2/5/07 9:50 AM, "Heywood, Todd" wrote: > > Hi Ralph, > > Thanks for the reply. The OpenMPI version is 1.2b2 (because I > would like to > integrate it with SGE). > > Here is what is happening: > > > (1) When I run with debug-daemons (but WITHOUT d), I get ³Daemon> > [0,0,27] checking in as pid 7620 on host blade28² (for example) messages for > > most but not all of the daemons that should be started up, and then it > hangs. > I also notice ³reconnecting to LDAP server² messages in various > > /var/log/secure files, and cannot login while things are hung (with ³su:> > pam_ldap: ldap_result Can't contact LDAP server² in /var/log/messages). So > > apparently LDAP hits some limit to opening ssh sessions, and I¹m not sure > how > to address this. > (2) When I run with debug-daemons AND the debug > option d, all daemons > start start up and check-in, albeit slowly (debug > must slow things down so > LDAP can handle all the requests??). Then > apparently, the cpi process is > started for each task but it then hangs: > > > [blade1:23816] spawn: in job_state_callback(jobid = 1, state = 0x4) > > [blade1:23816] Info: Setting up debugger process table for applications > > MPIR_being_debugged = 0 > MPIR_debug_gate = 0 > MPIR_debug_state = 1 > > MPIR_acquired_pre_main = 0 > MPIR_i_am_starter = 0 > MPIR_proctable_size = > 800 > MPIR_proctable: >(i, host, exe, pid) = (0, blade1, > /home4/itstaff/heywood/ompi/cpi,
Re: [OMPI users] [OMPI Users] OpenMPI 1.1.4 over ethernet fails
On Feb 2, 2007, at 11:22 AM, Alex Tumanov wrote: That really did fix it, George: # mpirun --prefix $MPIHOME -hostfile ~/testdir/hosts --mca btl tcp,self --mca btl_tcp_if_exclude ib0,ib1 ~/testdir/hello Hello from Alex' MPI test program Process 0 on dr11.lsf.platform.com out of 2 Hello from Alex' MPI test program Process 1 on compute-0-0.local out of 2 It never occurred to me that the headnode would try to communicate with the slave using infiniband interfaces... Orthogonally, what are The problem here is that since your IB IP addresses are "public" (meaning that they're not in the IETF defined ranges for private IP addresses), Open MPI assumes that they can be used to communicate with your back-end nodes on the IPoIB network. See this FAQ entry for details: http://www.open-mpi.org/faq/?category=tcp#tcp-routability If you update your IP addresses to be in the private range, Open MPI should do the Right routability computations and you shouldn't need to exclude anything. the industry standard OpenMPI benchmark tests I could run to perform a real test? Just about anything will work -- NetPIPE, the Intel Benchmarks, ...etc. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] MPI_Type_create_subarray fails!
A correction has been made to the MPI_Type_create_subarray. The particular test that was failing for you has been replaced with a better one. You can grab it either from the nightly build or in few days from the next (1.2) release. Thanks, george. On Feb 6, 2007, at 9:52 AM, Avishay Traeger wrote: Surely there is a better way to get this code running without disabling checks. Any suggestions? Thanks, Avishay On Mon, 2007-02-05 at 15:36 -0500, Ivan de Jesus Deras Tabora wrote: I managed to make it run by disabling the parameter checking. I added --mca mpi_param_check 0 to mpirun and it works ok, so maybe the problem is with the parameter checking code. On 2/2/07, Ivan de Jesus Deras Tabora wrote: I've been checking the OpenMPI code, trying to find something, but still no luck. I'll continue checking the code. On 2/2/07, Robert Latham wrote: On Tue, Jan 30, 2007 at 04:55:09PM -0500, Ivan de Jesus Deras Tabora wrote: Then I find all the references to the MPI_Type_create_subarray and create a little program just to test that part of the code, the code I created is: ... After running this little program using mpirun, it raises the same error. This small program runs fine under MPICH2. Either you have found a bug in OpenMPI (passing it a datatype it should be able to handle), or a bug in MPICH2 (passing it a datatype it handled, but should have complained about). ==rob -- Rob Latham Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users "Half of what I say is meaningless; but I say it so that the other half may reach you" Kahlil Gibran
Re: [OMPI users] running OpenMPI jobs over Myrinet gm interconnect
Thank you for your reply, Reese! What version of GM are you running? # rpm -qa |egrep "^gm-[0-9]+|^gm-devel" gm-2.0.24-1 gm-devel-2.0.24-1 Is this too old? And are you sure that gm_board_info shows all the nodes that are listed in your machine file? Yes, that was the issue - bad cable connection to my compute node prevented it from being seen on the fabric :( Thanks for pointing this out for me. Could you send a copy of your gm_board_info output , please? Sure: # ./gm_board_info GM build ID is "2.0.24_Linux_rc20051223164441PST @dr11.myco.com:/usr/src/redhat/BUILD/gm-2.0.24_Linux Tue Jan 30 23:07:45 EST 2007." Board number 0: lanai_cpu_version = 0x0a00 (LANai10.0) lanai_sram_size = 0x001fe000 (2040K bytes) ROM settings: MAC=00:60:dd:49:1e:bf SN=187449 PC=M3F-PCIXD-2 PN=09-02666 LANai time is 0x209b211b12 ticks, or about 1043 minutes since reset. Mapper is 00:60:dd:49:99:96. Map version is 1965903. 2 hosts. Network is fully configured. This node is "dr11.myco.com" Board has room for 16 ports, 1559 nodes/routes, 16384 cache entries Port token cnt: send=61, recv=253 Port: Status PID 0: BUSY 7489 (this process [gm_board_info]) 1: BUSY 25113 Route table for this node follows: gmID MAC Address gmName Route - - 1 00:60:dd:49:1e:bfdr11.myco.com (this node) 2 00:60:dd:49:99:96dr05.myco.com 81 (mapper) A mismatch between the list of nodes actually configured onto the Myrinet fabric and the machine file is a common source of errors like this. The mismatch could be caused by cable failure or other mapping issues. Could you elaborate on the mapping issues you mentioned? What are they? Why GM instead of MX, by the way? We have a few MX cards in-house, but no MX switch due to its current market price. So we're only able to perform MX testing using direct-connection cables, which is not very exciting :) On the contrary, we've already had GM boards and a switch and found it sufficient for OpenMPI testing purposes. Would be great to upgrade to MX in the near future. Thank you very much for your help. Sincerely, Alex.
[OMPI users] Problems with MPI_Init
Hello everyone, I'm using MPI (ParMetis) on an 64 bits machine. When I tried to test it using the example programs it hangs with an error message wich says that I must change the device to ch_p4mpd. So, once I change it on the file mpirun.ch4 the application starts and hangs (never returns) in the MPI_Init instruction. Using gdb to debug it I found the function that hangs: BNR_Fence. The call stack is the following: 0x004af87e in BNR_Fence () (gdb) up #1 0x004a56e2 in bm_start () (gdb) up #2 0x004a4053 in p4_initenv () (gdb) up #3 0x004b3bc4 in MPID_P4_Init () (gdb) up #4 0x004b3806 in MPID_CH_InitMsgPass () (gdb) up #5 0x004b1125 in MPID_Init () (gdb) up #6 0x0048606d in MPIR_Init () (gdb) up #7 0x00485e6d in PMPI_Init () (gdb) up #8 0x00406066 in main () Does anyone have a clue of what's going on? Anyway, thank you all. Pablo __ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo! Respuestas (Beta). ¡Probalo ya! http://www.yahoo.com.ar/respuestas
Re: [OMPI users] Problems with MPI_Init
Greetings Pablo. Please note that this list is for support of the Open MPI software package. From the output you included, it looks like you are not using Open MPI, but are rather using one of the MPICH variants (i.e., a different software package). You might want to send your question to the MPICH mailing list -- we won't really be able to help you here. Good luck! On Feb 6, 2007, at 11:55 AM, Pablo Hernán Rodríguez Zivic wrote: Hello everyone, I'm using MPI (ParMetis) on an 64 bits machine. When I tried to test it using the example programs it hangs with an error message wich says that I must change the device to ch_p4mpd. So, once I change it on the file mpirun.ch4 the application starts and hangs (never returns) in the MPI_Init instruction. Using gdb to debug it I found the function that hangs: BNR_Fence. The call stack is the following: 0x004af87e in BNR_Fence () (gdb) up #1 0x004a56e2 in bm_start () (gdb) up #2 0x004a4053 in p4_initenv () (gdb) up #3 0x004b3bc4 in MPID_P4_Init () (gdb) up #4 0x004b3806 in MPID_CH_InitMsgPass () (gdb) up #5 0x004b1125 in MPID_Init () (gdb) up #6 0x0048606d in MPIR_Init () (gdb) up #7 0x00485e6d in PMPI_Init () (gdb) up #8 0x00406066 in main () Does anyone have a clue of what's going on? Anyway, thank you all. Pablo __Preguntá. Respondé. Descubrí.Todo lo que querías saber, y lo que ni imaginabas,está en Yahoo! Respuestas (Beta).¡Probalo ya!http:// www.yahoo.com.ar/respuestas ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] running OpenMPI jobs over Myrinet gm interconnect
What version of GM are you running? # rpm -qa |egrep "^gm-[0-9]+|^gm-devel" gm-2.0.24-1 gm-devel-2.0.24-1 Is this too old? Nope, that's just fine. A mismatch between the list of nodes actually configured onto the Myrinet fabric and the machine file is a common source of errors like this. The mismatch could be caused by cable failure or other mapping issues. Could you elaborate on the mapping issues you mentioned? What are they? If you have 3 nodes, A,B,C and the mapper on node C dies for some reason (very unusual, but maybe killed by mistake, say), then node B gets rebooted, then when node B comes back up, it will have routes to only node A and itself, though A and C will still have routes everywhere. The map versions on A and B will match, but C will have an old map version. Thus, an MPI job spanning A,B,C would fail, even though all 3 nodes show up in gm_board_info from node A. Why GM instead of MX, by the way? We have a few MX cards in-house, but no MX switch due to its current market price. So we're only able to perform MX testing using direct-connection cables, which is not very exciting :) On the contrary, we've already had GM boards and a switch and found it sufficient for OpenMPI testing purposes. Would be great to upgrade to MX in the near future. MX is just a different software stack, the hardware is the same. MX works with both 2G and 10G, but GM does not work with the 10G cards. I see from your gm_board_info output that you are using D-cards, which MX supports (anything D or later is supported by MX, but not B or C cards). Switches don't care about MX vs. GM. MX will give better performance for most MPI applications than GM, and hardware too old for MX is fairly uncommon. -reese
Re: [OMPI users] [OMPI Users] OpenMPI 1.1.4 over ethernet fails
Thanks for your reply, Jeff. > It never occurred to me that the headnode would try to communicate > with the slave using infiniband interfaces... Orthogonally, what are The problem here is that since your IB IP addresses are "public" (meaning that they're not in the IETF defined ranges for private IP addresses), Open MPI assumes that they can be used to communicate with your back-end nodes on the IPoIB network. See this FAQ entry for details: http://www.open-mpi.org/faq/?category=tcp#tcp-routability The pointer was rather informative. We do have to use non-standard ranges for IB interfaces, because we're performing automatic IP over IB configuration based on the eth0 IP and netmask. Given 10.x.y.z/8 configuration for eth0, the IPs assigned to infiniband interfaces will not only end up on the same subnet ID, but may even conflict with existing ethernet interface IP addresses. Hence the use of 20.x.y.z and 30.x.y.z for ib0 & ib1 respectively. > the industry standard OpenMPI benchmark tests I could run to perform a > real test? Just about anything will work -- NetPIPE, the Intel Benchmarks, ...etc. I actually tried benchmarking with HPLinpack. Specifically, I'm interested in measuring performance improvements when running OpenMPI jobs over several available interconnects. However, I have difficulty interpreting the cryptic HPL output. I've seen members of the list using xhpl benchmark. Perhaps someone could shed some light on how to read its output? Also, my understanding is that the only advantage of multiple interconnect availability is the increased bandwidth for OpenMPI message striping - correct? In that case, I would probably benefit from a more bandwidth intensive benchmark. If the OpenMPI community could point me in the right direction for that, it would be greatly appreciated. I have a feeling that this is not one of HPL's strongest points. Thanks again for your willingness to help and share your expertise. Sincerely, Alex.
Re: [OMPI users] [OMPI Users] OpenMPI 1.1.4 over ethernet fails
On Feb 6, 2007, at 12:38 PM, Alex Tumanov wrote: http://www.open-mpi.org/faq/?category=tcp#tcp-routability The pointer was rather informative. We do have to use non-standard ranges for IB interfaces, because we're performing automatic IP over IB configuration based on the eth0 IP and netmask. Given 10.x.y.z/8 configuration for eth0, the IPs assigned to infiniband interfaces will not only end up on the same subnet ID, but may even conflict with existing ethernet interface IP addresses. Hence the use of 20.x.y.z and 30.x.y.z for ib0 & ib1 respectively. I'm not sure I'm parsing your explanation properly. Are you saying that your cluster's ethernet addresses are dispersed across all of 10.x.y.z, and therefore you don't want the IPoIB addresses to conflict? Even being conservative, that's 250^3 IP addresses (over 15 million). There should be plenty of space in there for your cluster's ethernet and IPoIB addresses to share (and any other machines that also share your 10.x.y.z address space). But it doesn't really matter -- this is a minor point. :-) I actually tried benchmarking with HPLinpack. Specifically, I'm interested in measuring performance improvements when running OpenMPI jobs over several available interconnects. However, I have difficulty interpreting the cryptic HPL output. I've seen members of the list using xhpl benchmark. Perhaps someone could shed some light on how to read its output? Also, my understanding is that the only advantage of I'll defer to others on this one... multiple interconnect availability is the increased bandwidth for OpenMPI message striping - correct? In that case, I would probably That's a big reason, yes. benefit from a more bandwidth intensive benchmark. If the OpenMPI community could point me in the right direction for that, it would be greatly appreciated. I have a feeling that this is not one of HPL's strongest points. Actually, it depends on how big your HPL problem size it. HPL can send very large messages if you set the size high enough. For example, when we were running HPL at Sandia for its Top500 run, we were seeing 800MB messages (for 4000+ nodes, lotsa memory -- very large HPL problem size). A simple ping-pong benchmark can also be useful to ballpark what you're seeing for your network performance. My personal favorite is NetPIPE, but there's others as well. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] MPI_Type_create_subarray fails!
FWIW, this has been committed on the 1.1 branch. So if we ever do a 1.1.5 release, it will be included. On Feb 6, 2007, at 10:46 AM, George Bosilca wrote: A correction has been made to the MPI_Type_create_subarray. The particular test that was failing for you has been replaced with a better one. You can grab it either from the nightly build or in few days from the next (1.2) release. Thanks, george. On Feb 6, 2007, at 9:52 AM, Avishay Traeger wrote: Surely there is a better way to get this code running without disabling checks. Any suggestions? Thanks, Avishay On Mon, 2007-02-05 at 15:36 -0500, Ivan de Jesus Deras Tabora wrote: I managed to make it run by disabling the parameter checking. I added --mca mpi_param_check 0 to mpirun and it works ok, so maybe the problem is with the parameter checking code. On 2/2/07, Ivan de Jesus Deras Tabora wrote: I've been checking the OpenMPI code, trying to find something, but still no luck. I'll continue checking the code. On 2/2/07, Robert Latham wrote: On Tue, Jan 30, 2007 at 04:55:09PM -0500, Ivan de Jesus Deras Tabora wrote: Then I find all the references to the MPI_Type_create_subarray and create a little program just to test that part of the code, the code I created is: ... After running this little program using mpirun, it raises the same error. This small program runs fine under MPICH2. Either you have found a bug in OpenMPI (passing it a datatype it should be able to handle), or a bug in MPICH2 (passing it a datatype it handled, but should have complained about). ==rob -- Rob Latham Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users "Half of what I say is meaningless; but I say it so that the other half may reach you" Kahlil Gibran ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI users] large jobs hang on startup (deadlock?)
Hi Ralph, It looks that way. I created a user local to each node, with local authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up just fine for that. I know this is an OpenMPI list, but does anyone know how common or uncommon LDAP-based clusters are? I would have thought this issue would have arisen elsewhere, but Googling MPI+LDAP (and similar) doesn't turn up much. I'd certainly be willing to test any patch. Thanks. Todd -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain Sent: Tuesday, February 06, 2007 9:54 AM To: Open MPI Users Subject: Re: [OMPI users] large jobs hang on startup (deadlock?) It sounds to me like we are probably overwhelming your slapd - your test would seem to indicate that slowing down the slapd makes us fail even with smaller jobs, which tends to support that idea. We frankly haven't encountered that before since our rsh tests have all been done using non-LDAP authentication (basically, we ask that you setup rsh to auto-authenticate on each node). It sounds like we need to add an ability to slow down so that the daemon doesn't "fail" due to authentication timeout and/or slapd rejection due to the queue being full. This may take a little time to fix due to other priorities, and will almost certainly have to be released in a subsequent 1.2.x version. Meantime, I'll let you know when I get something to test - would you be willing to give it a shot if I provide a patch? I don't have access to an LDAP-based system. Ralph On 2/6/07 7:44 AM, "Heywood, Todd" wrote: > Hi Ralph, Thanks for the reply. This is a tough one. It is OpenLDAP. I had > thought that I might be hitting a file descriptor limit for slapd (LDAP > daemon), which ulimit -n does not effect (you have to rebuild LDAP with a > different FD_SETSIZE variable). However, I simply turned on more expressive > logging to /var/log/slapd, and that resulted in smaller jobs (which > successfully ran before) hanging. Go figure. It appears that daemons are up > and running (from ps), and everything hangs in MPI_Init. Ctl-C > gives [blade1:04524] ERROR: A daemon on node blade26 failed to start as > expected. [blade1:04524] ERROR: There may be more information available > from [blade1:04524] ERROR: the remote shell (see above). [blade1:04524] ERROR: > The daemon exited unexpectedly with status 255. I'm interested in any > suggestion, semi-fixes, etc. which might help get to the bottom of this. Right > now: whether the daemons are indeed up and running, or if there are some that > are not (causing MPI_Init to hang). Thanks, Todd -Original > Message- From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain Sent: > Tuesday, February 06, 2007 8:52 AM To: Open MPI Users > Subject: Re: [OMPI users] large jobs hang on startup > (deadlock?) Well, I can't say for sure about LDAP. I did a quick search and > found two things: 1. there are limits imposed in LDAP that may apply to your > situation, and 2. that statement varies tremendously depending upon the > specific LDAP implementation you are using I would suggest you see which LDAP > you are using and contact the respective organization to ask if they do have > such a limit, and if so, how to adjust it. It sounds like maybe we are > hitting the LDAP server with too many requests too rapidly. Usually, the issue > is not starting fast enough, so this is a new one! We don't currently check to > see if everything started up okay, so that is why the processes might hang - > we hope to fix that soon. I'll have to see if there is something we can do to > help alleviate such problems - might not be in time for the 1.2 release, but > perhaps it will make a subsequent "fix" or, if you are willing/interested, I > could provide it to you as a "patch" you could use until a later official > release. Meantime, you might try upgrading to 1.2b3 or even a nightly release > from the trunk. There are known problems with 1.2b2 (which is why there is a > b3 and soon to be an rc1), though I don't think that will be the problem > here. At the least, the nightly trunk has a much better response to ctrl-c in > it. Ralph On 2/5/07 9:50 AM, "Heywood, Todd" wrote: > > Hi Ralph, > > Thanks for the reply. The OpenMPI version is 1.2b2 (because I > would like to > integrate it with SGE). > > Here is what is happening: > > > (1) When I run with debug-daemons (but WITHOUT d), I get ³Daemon> > [0,0,27] checking in as pid 7620 on host blade28² (for example) messages for > > most but not all of the daemons that should be started up, and then it > hangs. > I also notice ³reconnecting to LDAP server² messages in various > > /var/log/secure files, and cannot login while things are hung (with ³su:> > pam_ldap: ldap_result Can't contact LDAP server² in /var/log/messages). So > > apparently LDAP hits some limit to opening ssh sessions, and I¹m not sure > ho
Re: [OMPI users] large jobs hang on startup (deadlock?)
Hi Todd Just as a thought - you could try not using --debug-daemons or -d and instead setting "-mca pls_rsh_num_concurrent 50" or some such small number. This will tell the system to launch 50 ssh calls at a time, waiting for each group to complete before launching the next. You can't use it with --debug-daemons as that option prevents the ssh calls from "closing" so that you can get the output from the daemons. You can still launch as big a job as you like - we'll just do it 50 ssh calls at a time. If we are truly overwhelming the slapd, then this should alleviate the problem. Let me know if you get to try it... Ralph On 2/6/07 4:05 PM, "Heywood, Todd" wrote: > Hi Ralph, It looks that way. I created a user local to each node, with local > authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up just > fine for that. I know this is an OpenMPI list, but does anyone know how > common or uncommon LDAP-based clusters are? I would have thought this issue > would have arisen elsewhere, but Googling MPI+LDAP (and similar) doesn't turn > up much. I'd certainly be willing to test any patch. > Thanks. Todd -Original Message- From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain Sent: > Tuesday, February 06, 2007 9:54 AM To: Open MPI Users > Subject: Re: [OMPI users] large jobs hang on startup > (deadlock?) It sounds to me like we are probably overwhelming your slapd - > your test would seem to indicate that slowing down the slapd makes us fail > even with smaller jobs, which tends to support that idea. We frankly haven't > encountered that before since our rsh tests have all been done using non-LDAP > authentication (basically, we ask that you setup rsh to auto-authenticate on > each node). It sounds like we need to add an ability to slow down so that the > daemon doesn't "fail" due to authentication timeout and/or slapd rejection due > to the queue being full. This may take a little time to fix due to other > priorities, and will almost certainly have to be released in a subsequent > 1.2.x version. Meantime, I'll let you know when I get something to test - > would you be willing to give it a shot if I provide a patch? I don't have > access to an LDAP-based system. Ralph On 2/6/07 7:44 AM, "Heywood, Todd" > wrote: > Hi Ralph, Thanks for the reply. This is a tough > one. It is OpenLDAP. I had > thought that I might be hitting a file descriptor > limit for slapd (LDAP > daemon), which ulimit -n does not effect (you have to > rebuild LDAP with a > different FD_SETSIZE variable). However, I simply turned > on more expressive > logging to /var/log/slapd, and that resulted in smaller > jobs (which > successfully ran before) hanging. Go figure. It appears that > daemons are up > and running (from ps), and everything hangs in MPI_Init. > Ctl-C > gives [blade1:04524] ERROR: A daemon on node blade26 failed to start > as > expected. [blade1:04524] ERROR: There may be more information available > > from [blade1:04524] ERROR: the remote shell (see above). [blade1:04524] > ERROR: > The daemon exited unexpectedly with status 255. I'm interested in > any > suggestion, semi-fixes, etc. which might help get to the bottom of this. > Right > now: whether the daemons are indeed up and running, or if there are > some that > are not (causing MPI_Init to > hang). Thanks, Todd -Original > Message- From: > users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of > Ralph H Castain Sent: > Tuesday, February 06, 2007 8:52 AM To: Open MPI > Users > Subject: Re: [OMPI users] large jobs hang on > startup > (deadlock?) Well, I can't say for sure about LDAP. I did a quick > search and > found two things: 1. there are limits imposed in LDAP that may > apply to your > situation, and 2. that statement varies tremendously > depending upon the > specific LDAP implementation you are using I would > suggest you see which LDAP > you are using and contact the > respective organization to ask if they do have > such a limit, and if so, how > to adjust it. It sounds like maybe we are > hitting the LDAP server with too > many requests too rapidly. Usually, the issue > is not starting fast enough, > so this is a new one! We don't currently check to > see if everything started > up okay, so that is why the processes might hang - > we hope to fix that soon. > I'll have to see if there is something we can do to > help alleviate such > problems - might not be in time for the 1.2 release, but > perhaps it will > make a subsequent "fix" or, if you are willing/interested, I > could provide > it to you as a "patch" you could use until a later official > > release. Meantime, you might try upgrading to 1.2b3 or even a nightly > release > from the trunk. There are known problems with 1.2b2 (which is why > there is a > b3 and soon to be an rc1), though I don't think that will be the > problem > here. At the least, the nightly trunk has a much bett