[OMPI users] Shared Memory (SM) module and shared cache implications

2009-06-25 Thread Simone Pellegrini
Hello, I have a simple question for the shared memory (sm) module developers of Open MPI. In the current implementation, is there any advantage of having shared cache among processes communicating? For example let say we have P1 and P2 placed in the same CPU on 2 different physical cores with

Re: [OMPI users] Shared Memory (SM) module and shared cache implications

2009-06-25 Thread Ralph Castain
At the moment, I believe the answer is the main memory route. We have a project just starting here (LANL) to implement the cache-level exchange, but it won't be ready for release for awhile. On Jun 25, 2009, at 2:39 AM, Simone Pellegrini wrote: Hello, I have a simple question for the share

Re: [OMPI users] Shared Memory (SM) module and shared cache implications

2009-06-25 Thread Simone Pellegrini
Ralph Castain wrote: At the moment, I believe the answer is the main memory route. We have a project just starting here (LANL) to implement the cache-level exchange, but it won't be ready for release for awhile. Interesting, actually I am a PhD student and my topic is optimization of MPI applic

Re: [OMPI users] Shared Memory (SM) module and sharedcache implications

2009-06-25 Thread Jeff Squyres
FWIW: there's also work going on to use direct process-to-process copies (vs. using shared memory bounce buffers). Various MPI implementations have had this technology for a while (e.g., QLogic's PSM-based MPI); the Open-MX guys are publishing the knem open source kernel module for this pu

Re: [OMPI users] Shared Memory (SM) module and sharedcache implications

2009-06-25 Thread Ralph Castain
Doesn't that still pull the message off-socket? I thought it went through the kernel for that method, which means moving it to main memory. On Jun 25, 2009, at 6:49 AM, Jeff Squyres wrote: FWIW: there's also work going on to use direct process-to-process copies (vs. using shared memory bo

Re: [OMPI users] Shared Memory (SM) module andsharedcache implications

2009-06-25 Thread Jeff Squyres
On Jun 25, 2009, at 9:12 AM, Ralph Castain wrote: Doesn't that still pull the message off-socket? I thought it went through the kernel for that method, which means moving it to main memory. It may or may not. Sorry -- let me clarify: I was just pointing out other on-node/memory- based work

[OMPI users] PBSPro/OpenMPI Errors

2009-06-25 Thread Robert Jackson
When using OpenMPI and nwchem standalone (mpirun --byslot --mca btl self,sm,tcp --mca btl_base_verbose 30 --mca btl_tcp_if_exclude lo,eth1 $NWCHEM h2o.nw > & h2o.nwo.$$) the job runs fine. When running the same job via the PBSPro scheduler I get errors. The PBS script is called nwrun and is run

[OMPI users] Infiniband requirements

2009-06-25 Thread Jim Kress
Is it correct to assume that, when one is configuring openmpi v1.3.2 and if one leaves out the --with-openib=/dir from the ./configure command line, that InfiniBand support will NOT be built into openmpi v1.3.2? Then, if an Ethernet network is present that connects all the nodes, openmpi will us

[OMPI users] MX questions

2009-06-25 Thread Dave Love
It's not reproducible, but I sometimes see messages like [node01:29645] MX BTL delete procs running 1.3.1 with Open-MX and the MX BTL. Looking at the code, it's a dummy routine, but I didn't get as far as figuring out why it's (sometimes) called and what its significance is. Can someone expl

Re: [OMPI users] OpenMPI and SGE

2009-06-25 Thread Ray Muno
As a follow up, the problem was with host name resolution. The error was introduced, with a change to the Rocks environment, which broke reverse lookups for host names. -- Ray Muno

Re: [OMPI users] MX questions

2009-06-25 Thread Scott Atchley
On Jun 25, 2009, at 1:02 PM, Dave Love wrote: Also, Brice Goglin, the Open-MX author had a couple of questions concerning multi-rail MX while I'm on: 1. Does the MX MTL work with multi-rail? I believe the answer is yes as long as all NICs are in the same fabric (they usually are). 2. "Yo

[OMPI users] Problem with qlogic cards InfiniPath_QLE7240 and AlltoAll call

2009-06-25 Thread D'Auria, Raffaella
Dear All, I have been encountering a fatal type "error polling LP CQ with status RETRY EXCEEDED ERROR status number 12" whenever I try to run a simple MPI code (see below) that performs an AlltoAll call. We are running the OpenMPI 1.3.2 stack on top of the OFED 1.4.1 stack. Our cluster is comp

[OMPI users] Did you break MPI_Abort recently?

2009-06-25 Thread Mostyn Lewis
While using the BLACS test programs, I've seen that with recent SVN checkouts (including todays) the MPI_Abort test left procs running. The last SVN I have where it worked was 1.4a1r20936. By 1.4a1r21246 it fails. Works O.K. in the standard 1.3.2 release. A test program is below. GCC was used.

Re: [OMPI users] Did you break MPI_Abort recently?

2009-06-25 Thread Ralph Castain
Using what launch environment? On Jun 25, 2009, at 2:29 PM, Mostyn Lewis wrote: While using the BLACS test programs, I've seen that with recent SVN checkouts (including todays) the MPI_Abort test left procs running. The last SVN I have where it worked was 1.4a1r20936. By 1.4a1r21246 it fail

Re: [OMPI users] Did you break MPI_Abort recently?

2009-06-25 Thread Mostyn Lewis
Something like: #!/bin/ksh set -x PREFIX=$OPENMPI_GCC_SVN export PATH=$OPENMPI_GCC_SVN/bin:$PATH MCA="--mca btl tcp,self" mpicc -g -O6 mpiabort.c NPROCS=4 mpirun --prefix $PREFIX -x LD_LIBRARY_PATH $MCA -np $NPROCS -machinefile fred ./a.out DM On Thu, 25 Jun 2009, Ralph Castain wrote: Using

Re: [OMPI users] Did you break MPI_Abort recently?

2009-06-25 Thread Ralph Castain
Sorry - should have been more clear. Are you using rsh, qrsh (i.e., SGE), SLURM, Torque, ? On Jun 25, 2009, at 2:54 PM, Mostyn Lewis wrote: Something like: #!/bin/ksh set -x PREFIX=$OPENMPI_GCC_SVN export PATH=$OPENMPI_GCC_SVN/bin:$PATH MCA="--mca btl tcp,self" mpicc -g -O6 mpiabort.c N

Re: [OMPI users] Did you break MPI_Abort recently?

2009-06-25 Thread Mostyn Lewis
Just local machine - direct from the command line wth a script like the one below. So, no launch mechanism. Fails on SUSE Linux Enterprise Server 10 (x86_64) - SP2 and Fedora release 10 (Cambridge), for example. DM On Thu, 25 Jun 2009, Ralph Castain wrote: Sorry - should have been more clear.

Re: [OMPI users] MX questions

2009-06-25 Thread George Bosilca
On Jun 25, 2009, at 13:17 , Scott Atchley wrote: On Jun 25, 2009, at 1:02 PM, Dave Love wrote: Also, Brice Goglin, the Open-MX author had a couple of questions concerning multi-rail MX while I'm on: 1. Does the MX MTL work with multi-rail? I believe the answer is yes as long as all NICs ar

Re: [OMPI users] Infiniband requirements

2009-06-25 Thread Gus Correa
Hi Jim, list 1) Your first question: I opened a thread on this list two months or so ago about a similar situation: when OpenMPI would use/not use libnuma. I asked a question very similar to your question about IB support, and how the configure script would provide it or not. Jeff answerer it, a

Re: [OMPI users] Infiniband requirements

2009-06-25 Thread Jeff Squyres
On Jun 25, 2009, at 12:53 PM, Jim Kress wrote: Is it correct to assume that, when one is configuring openmpi v1.3.2 and if one leaves out the --with-openib=/dir from the ./configure command line, that InfiniBand support will NOT be built into openmpi v1.3.2? Then, if an Ethernet network i

Re: [OMPI users] 50%performance reduction due to OpenMPI v 1.3.2forcing allMPI traffic over Ethernet instead of using Infiniband

2009-06-25 Thread Jeff Squyres
This thread diverged quite a bit into Open MPI configuration and build issues -- did the original question get answered? On Jun 24, 2009, at 8:18 PM, Jim Kress ORG wrote: > Have you investigated Jeff's question on whether the code was > compiled/linked with the same OpenMPI version (1.3.2)?