Re: [OMPI users] Is Iprobe fast when there is no message to recieve
Keep in mind that MPI says you do have to eventually receive the message -- so just checking if it's there is not enough (eventually). Iprobe is definitely one way. You could also post a non-blocking receive (persistent or not) and MPI_TEST to see if it has completed. However, if the message is long, MPI implementations like Open MPI *may* require multiple invocations of the progression engine to actually receive the entire message (e.g., it may get fragmented by the sender and use a rendezvous protocol, therefore having multiple states in the progression logic, each of which may only advance one or two states in each call to MPI_TEST). That being said, if you just want to send a quick "notify" that an event has occurred, you might want to use a specific tag and/or communicator for these extraordinary messages. Then, when the event occurs, send a very short message on this special tag/communicator (potentially even a 0-byte message). Open MPI will send short messages eagerly and not require multiple states through a progression machine (heck, just about all MPI's do this). You can MPI_TEST for the completion of this short/0-byte receive very quickly. You can then send the actual data of the event in a different non-blocking receive that is only checked if the short "alert" message is received. There are a small number of cases (e.g., resource exhaustion) where Open MPI will have to fall back out of the eager send mode for short messages, but in general, sending a short message with an alert and a larger message with the actual data to be processed might be a good choice. On Oct 1, 2009, at 10:43 PM, Peter Lonjers wrote: I am not sure if this is the right place the ask this question but here it goes. Simplified abstract version of the question. I have 2 MPI processes and I want one to make an occasional signal to the other process. These signals will not happen at predictable times. I want the other process sitting in some kind of work loop to be able to make a very fast check to see if a signal has been sent to it. What is the best way to do this. Actual problem I am working on a realistic neural net simulator. The neurons are split into groups with one group to each processor to simulate them. Occasionally a neuron will spike and have to send that message to neurons on a different processor. This is a relatively rare event. The receiving neurons need to be able to make a very fast check to see if there is a message from neurons on another processor. The way I am doing it now is to use simple send and receive commands. The receiving cell does an iprobe check on every loop through the simulation for every cell that connects to it to see if there is a message(spike) from that cell. If the iprobe says there is a message is does a receive on that message. This seems convoluted though. I do not actually need to receive the message just know that a message is there. And it seems like depending on how Iprobe works there might be a faster method. Is Iprobe fast if there is no message to receive? Would persistent connections work better? Anyway any help would be greatly appreciated. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Are there ways to reduce the memory used by OpenMPI?
On Oct 1, 2009, at 2:56 PM, Blosch, Edwin L wrote: Are there are tuning parameters than I can use to reduce the amount of memory used by OpenMPI? I would very much like to use OpenMPI instead of MVAPICH, but I’m on a cluster where memory usage is the most important consideration. Here are three results which capture the problem: With the “leave_pinned” behavior turned on, I get good performance (19.528, lower is better) mpirun --prefix /usr/mpi/intel/openmpi-1.2.8 --machinefile FWIW, there have been a lot of improvements in Open MPI since the 1.2 series (including some memory reduction work) -- is it possible for you to upgrade to the latest 1.3 release? /var/spool/torque/aux/7972.fwnaeglingio -np 28 --mca btl ^tcp --mca mpi_leave_pinned 1 --mca mpool_base_use_mem_hooks 1 -x LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 /tmp/7972.fwnaeglingio/ falconv4_ibm_openmpi -cycles 100 -ri restart.0 -ro /tmp/ 7972.fwnaeglingio/restart.0 Compute rate (processor-microseconds/cell/cycle): 19.528 Total memory usage:38155.3477 MB (38.1553 GB) Turning off the leave_pinned behavior, I get considerably slower performance (28.788), but the memory usage is unchanged (still 38 GB) mpirun --prefix /usr/mpi/intel/openmpi-1.2.8 --machinefile /var/ spool/torque/aux/7972.fwnaeglingio -np 28 -x LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 /tmp/7972.fwnaeglingio/falconv4_ibm_openmpi - cycles 100 -ri restart.0 -ro /tmp/7972.fwnaeglingio/restart.0 Compute rate (processor-microseconds/cell/cycle): 28.788 Total memory usage:38335.7656 MB (38.3358 GB) I would guess that you are continually re-using the same communication buffers -- doing so will definitely be better with mpi_leave_pinned=1. Note, too, that mpi_leave_pinned is on by default for OpenFabrics networks in the Open MPI 1.3 series. Using MVAPICH, the performance is in the middle (23.6), but the memory usage is reduced by 5 to 6 GB out of 38 GB, a significant decrease to me. /usr/mpi/intel/mvapich-1.1.0/bin/mpirun_rsh -ssh -np 28 -hostfile / var/spool/torque/aux/7972.fwnaeglingio LD_LIBRARY_PATH="/usr/mpi/ intel/mvapich-1.1.0/lib/shared:/usr/mpi/intel/openmpi-1.2.8/lib64:/ appserv/intel/fce/10.1.008/lib:/appserv/intel/cce/10.1.008/lib" MPI_ENVIRONMENT=1 /tmp/7972.fwnaeglingio/falconv4_ibm_mvapich - cycles 100 -ri restart.0 -ro /tmp/7972.fwnaeglingio/restart.0 Compute rate (processor-microseconds/cell/cycle): 23.608 Total memory usage:32753.0586 MB (32.7531 GB) I didn’t see anything in the FAQ that discusses memory usage other than the impact of the “leave_pinned” option, which apparently does not affect the memory usage in my case. But I figure there must be a justification why OpenMPI would use 6 GB more than MVAPICH on the same case. Try the 1.3 series; we do have a bunch of knobs in there for memory usage -- there were significant changes/advancements in the 1.3 series with regards to how OpenFabrics buffers are registered. Get a baseline on that memory usage, and then let's see what you want to do from there. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] MPI_Comm_accept()/connect() errors
On Oct 1, 2009, at 7:00 AM, Blesson Varghese wrote: The following is the information regarding the error. I am running Open MPI 1.2.5 on Ubuntu 4.2.4, kernel version 2.6.24 Is there any chance that you can upgrade to the Open MPI v1.3 series? -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] job fails with "Signal: Bus error (7)"
Bus error usually means that there was an invalid address passed as a pointer somewhere in the code -- it's not usually a communications error. Without more information, it's rather difficult to speculate on what happened here. Did you get corefiles? If so, are there useful backtraces available? On Oct 1, 2009, at 6:01 AM, Sangamesh B wrote: Hi, A fortran application which is compiled with ifort-10.1 and open mpi 1.3.1 on Cent OS 5.2 fails after running 4 days with following error message: [compute-0-7:25430] *** Process received signal *** [compute-0-7:25433] *** Process received signal *** [compute-0-7:25433] Signal: Bus error (7) [compute-0-7:25433] Signal code: (2) [compute-0-7:25433] Failing at address: 0x4217b8 [compute-0-7:25431] *** Process received signal *** [compute-0-7:25431] Signal: Bus error (7) [compute-0-7:25431] Signal code: (2) [compute-0-7:25431] Failing at address: 0x4217b8 [compute-0-7:25432] *** Process received signal *** [compute-0-7:25432] Signal: Bus error (7) [compute-0-7:25432] Signal code: (2) [compute-0-7:25432] Failing at address: 0x4217b8 [compute-0-7:25430] Signal: Bus error (7) [compute-0-7:25430] Signal code: (2) [compute-0-7:25430] Failing at address: 0x4217b8 [compute-0-7:25431] *** Process received signal *** [compute-0-7:25431] Signal: Segmentation fault (11) [compute-0-7:25431] Signal code: (128) [compute-0-7:25431] Failing at address: (nil) [compute-0-7:25430] *** Process received signal *** [compute-0-7:25433] *** Process received signal *** [compute-0-7:25433] Signal: Segmentation fault (11) [compute-0-7:25433] Signal code: (128) [compute-0-7:25433] Failing at address: (nil) [compute-0-7:25432] *** Process received signal *** [compute-0-7:25432] Signal: Segmentation fault (11) [compute-0-7:25432] Signal code: (128) [compute-0-7:25432] Failing at address: (nil) [compute-0-7:25430] Signal: Segmentation fault (11) [compute-0-7:25430] Signal code: (128) [compute-0-7:25430] Failing at address: (nil) -- mpirun noticed that process rank 3 with PID 25433 on node compute-0-7.local exited on signal 11 (Segmentation fault). -- This job is run with 4 open mpi processes, on the nodes which have interconnected with Gigabit. The same job runs well on the nodes with infiniband connectivity. What could be the reason for this? Is this due to loose physical connectivities, as its giving a bus error? ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] use additional interface for openmpi
On Sep 29, 2009, at 9:58 AM, wrote: > Open MPI should just "figure it out" and do the Right Thing at run- > time -- is that not happening? you are right it should. But I want to exclude any traffic from OpenMPI communications, like NFS, traffic from other jobs and so on. And use only special ethernet interface for this purpose. So I have OpenMPI 1.3.3 installed on all nodes and head node in the same directory. OS is the same on all cluster - debian 5.0 On nodes I have two interfaces eth0 - for NFS and so on... and eht1 for OpenMPI. On head node I have 5 interfaces: eth0 for NFS, eth4 for OpenMPI Network is next: 1) Head node eht0 + nodes eht0: 192.168.0.0/24 2) Head node eth4 + nodes eth1: 192.168.1.0/24 So how I can configure OpenMPI for using only network 2) for my purpose? Try using "--mca btl_tcp_if_exclude eth0 --mca oob_tcp_if_exclude eth0". This will tell all machines not to use eth0. The only other network available is eth4 or eth1, so it should do the Right thing. Note that Open MPI has *two* TCP subsystems: the one used for MPI communications and the one used for "out of band" communications. BTL is the MPI communication subsystem; "oob" is the Out of Band communications subsystem. Other problem is next: I try to run some examples. But unfortunately it is not work. My be it is not correctly configured network. I can submit any jobs only on one host from this host. When I submit from head node for example to other nodes it hangs without any messages. And on node where I want to calculate I see that here is started orted daemon. (I use default config files) Below is examples: mpirun -v --mca btl self,sm,tcp --mca btl_base_verbose 30 --mca btl_tcp_if_include eth0 -np 2 -host n10,n11 cpi no output, no calculations, only orted daemon on nodes mpirun -v --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 2 - host n10,n11 cpi the same as abowe mpirun -v --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 2 - host n00,n00 cpi n00 is head node - it works and produces output. It sounds like OMPI is getting confused between the non-uniform networks. I have heard reports of OMPI not liking networks with different interface names, but it's not immediately obvious why the interface names are relevant to OMPI's selection criteria (and not enough details are available in the reports I heard before). Try the *_if_exclude methods above and see if that works for you. If not, let us know. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] Is Iprobe fast when there is no message to recieve
On Sat, 2009-10-03 at 07:05 -0400, Jeff Squyres wrote: > That being said, if you just want to send a quick "notify" that an > event has occurred, you might want to use a specific tag and/or > communicator for these extraordinary messages. Then, when the event > occurs, send a very short message on this special tag/communicator > (potentially even a 0-byte message). > You can MPI_TEST for > the completion of this short/0-byte receive very quickly. You can > then send the actual data of the event in a different non-blocking > receive that is only checked if the short "alert" message is received. In general I would say that Iprobe is a bad thing to use, as Jeff says post a receive in advance and then call test on this receive rather than using Iprobe. >From your description it sounds like a zero byte send is all you need which should be fast in all cases. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk