Re: [OMPI users] LAMA of openmpi-1.7.3 is unstable

2013-11-07 Thread Ralph Castain
I suspect something else is going on there - I can't imagine how the LAMA mapper could be interacting with the Torque launcher. The check for adequate resources (per the error message) is done long before we get to the launcher. I'll have to let the LAMA supporters chase it down. Thanks Ralph

Re: [OMPI users] LAMA of openmpi-1.7.3 is unstable

2013-11-07 Thread tmishima
Thanks, Ralph. This is an additional information. Just execute directly on the node without Torque: mpirun -np 8 -report-bindings -mca rmaps lama -mca rmaps_lama_bind 1c Myprog Then it also works, which means the combination of LAMA and Torque would case the problem. Tetsuya Mishima > Okay,

Re: [OMPI users] LAMA of openmpi-1.7.3 is unstable

2013-11-07 Thread Ralph Castain
Okay, so the problem is a bug in LAMA itself. I'll file a ticket and let the LAMA folks look into it. On Nov 7, 2013, at 4:18 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > I quickly tried 2 runs: > > mpirun -report-bindings -bind-to core Myprog > mpirun -machinefile pbs_hosts -np

Re: [OMPI users] LAMA of openmpi-1.7.3 is unstable

2013-11-07 Thread tmishima
Hi Ralph, I quickly tried 2 runs: mpirun -report-bindings -bind-to core Myprog mpirun -machinefile pbs_hosts -np ${NPROCS} -report-bindings -bind-to core Myprog It works fine in both cases on node03 and node08. Regards, Tetsuya Mishima > What happens if you drop the LAMA request and instead

Re: [OMPI users] LAMA of openmpi-1.7.3 is unstable

2013-11-07 Thread Ralph Castain
What happens if you drop the LAMA request and instead run mpirun -report-bindings -bind-to core Myprog This would do the same thing - does it work? If so, then we know it is a problem in the LAMA mapper. If not, then it is likely a problem in a different section of the code. On Nov 7, 2013,

[OMPI users] LAMA of openmpi-1.7.3 is unstable

2013-11-07 Thread tmishima
Dear openmpi developers, I tried the new function LAMA of openmpi-1.7.3 and unfortunately it is not stable under my environment, which is built with torque. (1) I used 4 scripts as shown below to clarify the problem: (COMMON PART) #!/bin/sh #PBS -l nodes=node03:ppn=8 / nodes=node08:ppn=8 expor

Re: [OMPI users] proper use of MPI_Abort

2013-11-07 Thread Andrus, Brian Contractor
Jeff, Good to know. Thanks! Seems really like MPI_ABORT should only be used within error traps after MPI functions have been started. Code-wise, the sample I got was not the best. Usage should be checked before MPI_Initialize, I think :) It seems the expectation is that MPI_ABORT is only call

Re: [OMPI users] MPI_File_write hangs on NFS-mounted filesystem

2013-11-07 Thread Gus Correa
Hi Steven, Dmytry Not sure if this web page is still valid or totally out of date, but there it goes anyway, in the hopes that it may help: http://www.mcs.anl.gov/research/projects/mpi/mpich1-old/docs/install/node38.htm On the other hand, one expert seems to dismiss NFS for paralllel IO: http:

Re: [OMPI users] MPI_File_write hangs on NFS-mounted filesystem

2013-11-07 Thread Jeff Hammond
That's a relatively old version of OMPI. Maybe try the latest release? That's always the safe bet since the issue might have been fixed already. I recall that OMPI uses ROMIO so you might try to reproduce with MPICH so you can report it to the people that wrote the MPI-IO code. Of course, this mi

Re: [OMPI users] MPI_File_write hangs on NFS-mounted filesystem

2013-11-07 Thread Dmitry N. Mikushin
Not sure if this is related, but: I've seen a case of performance degradation on NFS and Lustre when writing NetCDF files. The reason was that the file was filled with a loop writing one 4-byte record at once. Performance became close to local hard drive, when I simply introduced buffering of reco

[OMPI users] MPI_File_write hangs on NFS-mounted filesystem

2013-11-07 Thread Steven G Johnson
The simple C program attached below hangs on MPI_File_write when I am using an NFS-mounted filesystem. Is MPI-IO supported in OpenMPI for NFS filesystems? I'm using OpenMPI 1.4.5 on Debian stable (wheezy), 64-bit Opteron CPU, Linux 3.2.51. I was surprised by this because the problems only st