Re: [OMPI users] Interaction between Intel and OpenMPI floating point exceptions

2009-04-07 Thread Steve Lowder
Iain, Thanks for the reply, yours sounds like a good suggestion to try to work around this. Steve Iain Bason wrote: On Apr 6, 2009, at 7:22 PM, Steve Lowder wrote: Recently I've been running an MPI code that uses the LAPACK slamch routine to determine machine precision parameters. This s

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Steve Kargl
On Tue, Apr 07, 2009 at 02:23:45PM -0600, Ralph Castain wrote: > It isn't in a file - unless you specify it, OMPI will set it > automatically based on the number of procs on the node vs. what OMPI > thinks are the number of available processors. The question is: why > does OMPI not correctly

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Mostyn Lewis
Does OpenMPI know about the number of CPUS per node for FreeBSD? DM On Tue, 7 Apr 2009, Ralph Castain wrote: I would really suggest looking at George's note first as I think you are chasing your tail here. It sounds like the most likely problem is that OMPI thinks you are oversubscribed and i

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Ralph Castain
It isn't in a file - unless you specify it, OMPI will set it automatically based on the number of procs on the node vs. what OMPI thinks are the number of available processors. The question is: why does OMPI not correctly know the number of processors on your machine? I don't remember now,

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Steve Kargl
On Tue, Apr 07, 2009 at 01:40:13PM -0600, Ralph Castain wrote: > I would really suggest looking at George's note first as I think you > are chasing your tail here. It sounds like the most likely problem is > that OMPI thinks you are oversubscribed and is setting sched_yield > accordingly. whi

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Ralph Castain
I would really suggest looking at George's note first as I think you are chasing your tail here. It sounds like the most likely problem is that OMPI thinks you are oversubscribed and is setting sched_yield accordingly. which would fully account for these diffs. Note that the methods for set

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Steve Kargl
On Tue, Apr 07, 2009 at 03:18:31PM -0400, George Bosilca wrote: > Steve, > > I spotted a strange value for the mpi_yield_when_idle MCA parameter. 1 > means your processor is oversubscribed, and this trigger a call to > sched_yield after each check on the SM. Are you running the job > oversub

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Steve Kargl
On Tue, Apr 07, 2009 at 12:00:55PM -0700, Mostyn Lewis wrote: > Steve, > > Did you rebuild 1.2.9? As I see you have static libraries, maybe there's > a lurking phthread or something else that may have changed over time? > > DM Yes. I downloaded 1.2.9, 1.3, and 1.3.1, all within minutes of each

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Ethan Mallove
Hi Steve, I see improvements in 1.3.1 as compared to 1.2.9 in Netpipe results. The below Open MPI installations were compiled with the same compiler, configure options, run on the same cluster, and run with the same MCA parameters. (Note, ClusterTools 8.2 is essentially 1.3.1r20828.) http://www

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread George Bosilca
Steve, I spotted a strange value for the mpi_yield_when_idle MCA parameter. 1 means your processor is oversubscribed, and this trigger a call to sched_yield after each check on the SM. Are you running the job oversubscribed? If not it looks like somehow we don't correctly identify that th

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Mostyn Lewis
Steve, Did you rebuild 1.2.9? As I see you have static libraries, maybe there's a lurking phthread or something else that may have changed over time? DM On Tue, 7 Apr 2009, Steve Kargl wrote: On Tue, Apr 07, 2009 at 09:10:21AM -0700, Eugene Loh wrote: Steve Kargl wrote: I can rebuild 1.2.9

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Ralph Castain
[node20.cimu.org:90002] btl_sm_bandwidth=900 (default value) [node20.cimu.org:90002] btl_sm_latency=100 (default value) All these params do is influence the selection logic for deciding which BTL to use to send the data. Since you directed OMPI to only use sm, they are irrelevant. On Apr

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Steve Kargl
On Tue, Apr 07, 2009 at 09:10:21AM -0700, Eugene Loh wrote: > Steve Kargl wrote: > > >I can rebuild 1.2.9 and 1.3.1. Is there any particular configure > >options that I should enable/disable? > > I hope someone else will chime in here, because I'm somewhat out of > ideas. All I'm saying is tha

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Steve Kargl
On Tue, Apr 07, 2009 at 08:39:20AM -0700, Eugene Loh wrote: > Iain Bason wrote: > > >But maybe Steve should try 1.3.2 instead? Does that have your > >improvements in it? > > 1.3.2 has the single-queue implementation and automatic sizing of the sm > mmap file, both intended to fix problems at

Re: [OMPI users] Fwd: ssh MPi and program tests

2009-04-07 Thread Gus Correa
Hi Francesco Sorry, I was out of the loop, doing some real work ... :) Jody and Terry already gave you great advice (as they always do), and got you moving in the right direction, which is great news! More comments below. I think we need to cut this message short, for good mailing list etiquett

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Steve Kargl
On Tue, Apr 07, 2009 at 08:00:39AM -0700, Eugene Loh wrote: > Iain Bason wrote: > > >There are a bunch changes in the shared memory module between 1.2.9 > >and 1.3.1. One significant change is the introduction of the "sendi" > >internal interface. I believe George Bosilca did the initial >

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Peter Kjellstrom
On Tuesday 07 April 2009, Eugene Loh wrote: > Iain Bason wrote: > > But maybe Steve should try 1.3.2 instead? Does that have your > > improvements in it? > > 1.3.2 has the single-queue implementation and automatic sizing of the sm > mmap file, both intended to fix problems at large np. At np=2, y

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Eugene Loh
Steve Kargl wrote: I can rebuild 1.2.9 and 1.3.1. Is there any particular configure options that I should enable/disable? I hope someone else will chime in here, because I'm somewhat out of ideas. All I'm saying is that 10-usec latencies on sm with 1.3.0 or 1.3.1 are out of line with what o

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Eugene Loh
Iain Bason wrote: But maybe Steve should try 1.3.2 instead? Does that have your improvements in it? 1.3.2 has the single-queue implementation and automatic sizing of the sm mmap file, both intended to fix problems at large np. At np=2, you shouldn't expect to see much difference. And th

Re: [OMPI users] MPI can not open file?

2009-04-07 Thread Peter Kjellstrom
On Tuesday 07 April 2009, Bernhard Knapp wrote: > Hi > > I am trying to get a parallel job of the gromacs software started. MPI > seems to boot fine but unfortunately it seems not to be able to open a > specified file although it is definitly in the directory where the job > is started. Do all the

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Iain Bason
On Apr 7, 2009, at 11:00 AM, Eugene Loh wrote: Iain Bason wrote: There are a bunch changes in the shared memory module between 1.2.9 and 1.3.1. One significant change is the introduction of the "sendi" internal interface. I believe George Bosilca did the initial implementation. Thi

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Eugene Loh
Iain Bason wrote: There are a bunch changes in the shared memory module between 1.2.9 and 1.3.1. One significant change is the introduction of the "sendi" internal interface. I believe George Bosilca did the initial implementation. This is just a wild guess, but maybe there is somethin

Re: [OMPI users] Interaction between Intel and OpenMPI floating point exceptions

2009-04-07 Thread Iain Bason
On Apr 6, 2009, at 7:22 PM, Steve Lowder wrote: Recently I've been running an MPI code that uses the LAPACK slamch routine to determine machine precision parameters. This software is compiled using the latest Intel Fortran compiler and setting the - fpe0 argument to watch for certain floa

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Iain Bason
There are a bunch changes in the shared memory module between 1.2.9 and 1.3.1. One significant change is the introduction of the "sendi" internal interface. I believe George Bosilca did the initial implementation. This is just a wild guess, but maybe there is something about sendi that i

[OMPI users] Fwd: ssh MPi and program tests

2009-04-07 Thread Francesco Pietra
Hi Gustavo: "I feel myself stupid enough in this circumstance." That was the case. Adjusted as indicated by Jody, the connectivity test passed and the hello test: Hello, world, I am 0 of 4 1 of 4 2 of 4 3 of 4 Combine

Re: [OMPI users] ssh MPi and program tests

2009-04-07 Thread Francesco Pietra
Hi Jody: I should only blame myself. Gustavo's indications were clear. Still, I misunderstood them. Since I am testing on one node (where everything is there) mpirun -host deb64 -n 4 connectivity_c Connectivity test on 4 processes PASSED thanks francesco On Tue, Apr 7, 2009 at 12:27 PM, jody

Re: [OMPI users] " MPI can not open file?"

2009-04-07 Thread Ralph Castain
OMPI doesn't do anything wrt your file, so it can only be a question of (a) is your file on the remote machine, and (b) what directory it is in relative to where your process starts. Try just running pwd with mpirun and see what directory you are in. The you can ssh to that node and do an "

Re: [OMPI users] " MPI can not open file?"

2009-04-07 Thread Bernhard Knapp
Dear Ralph and other users I tried both versions with the relative path and with the -wdir option but in both cases the error is still the same. Additionally I tried to simply start the job in my home directory but it does not help either ... any other ideas? thx Bernhard [bknapp@quoVadis0

Re: [OMPI users] MPI can not open file?

2009-04-07 Thread Ralph Castain
I assume you are running in a non-managed environment and so are using ssh for your launcher? Could you tell us what version of OMPI you are using? The problem is that ssh drops you in your home directory, not your current working directory. Thus, the path to any file you specify must be

[OMPI users] MPI can not open file?

2009-04-07 Thread Bernhard Knapp
Hi I am trying to get a parallel job of the gromacs software started. MPI seems to boot fine but unfortunately it seems not to be able to open a specified file although it is definitly in the directory where the job is started. I also changed the file permissions to 777 but it does not affect

Re: [OMPI users] ssh MPi and program tests

2009-04-07 Thread jody
Hi What are the options "-deb64" and "-1" you are passing to mpirun: > /usr/local/bin/mpirun -deb64 -1 connectivity_c 2>&1 | tee n=1.connectivity.out I don't think these are legal options for mpirun (at least they don't show up in `man mpirun`). And i think you should add a "-n 4" (for 4 processo

Re: [OMPI users] ssh MPi and program tests

2009-04-07 Thread Terry Frankcombe
On Tue, 2009-04-07 at 11:39 +0200, Francesco Pietra wrote: > Hi Gus: > I should have set clear at the beginning that on the Zyxel router > (connected to Internet by dynamic IP afforded by the provider) there > are three computers. Their host names: > > deb32 (desktop debian i386) > > deb64 (mult

Re: [OMPI users] ssh MPi and program tests

2009-04-07 Thread Francesco Pietra
Hi Gus: I should have set clear at the beginning that on the Zyxel router (connected to Internet by dynamic IP afforded by the provider) there are three computers. Their host names: deb32 (desktop debian i386) deb64 (multisocket debian amd 64 lenny) tya64 (multisocket debian amd 64 lenny) The

Re: [OMPI users] Problem with running openMPI program

2009-04-07 Thread Ankush Kaul
Thank you sir, thanks a lot. The information you provided helped us a lot. Am currently going through the OpenMPI FAQ and will contact you in case of any doubts. Regards, Ankush Kaul