Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-04 Thread Laurence Marks
 ^^^ > > Also, if you know that #procs <= #cores on your nodes, you can greatly > improve performance by adding "--bind-to-core". > > > > On Apr 3, 2011, at 5:28 PM, Laurence Marks wrote: > >> And, before someone wonders, while Wien2k is a co

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
, Laurence Marks wrote: > Thanks. I will test this tomorrow. > > Many people run Wien2k with openmpi as you say, I only became aware of > the issue of Wien2k (and perhaps other codes) leaving orphaned > processes still running a few days ago. I also know someone who wants >

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
banned. Personally, as I don't want to be banned from the supercomputers I use I want to find a adequate patch for myself --- and then try and persuade the developers to adopt it. On Sun, Apr 3, 2011 at 6:13 PM, Ralph Castain wrote: > > On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote: >

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
:07 100 ./pp143 >> 10159 R 551436 839184  2.2 00:00:06 98.3 ./pp143 >> 10160 R 551760 839692  2.2 00:00:07 100 ./pp143 >> 10161 R 551788 839824  2.2 00:00:07 97.3 ./pp143 >> 10162 R 552256 840332  2.2 00:00:07 100 ./pp143 >> 10163 R 552216 840340  2.2 00:00:07 99.3 ./pp143 >&g

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 em

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
On Sun, Apr 3, 2011 at 11:41 AM, Ralph Castain wrote: > > On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote: > >> On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote: >>> >>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: >>> >>>> Let me e

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote: > > On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: > >> Let me expand on this slightly (in response to Ralph Castain's posting >> -- I had digest mode set). As currently constructed a shellscript in >> W

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
>> $PBS_O_WORKDIR/.tmp_$1 # Now the command we want to run echo $2 >> $PBS_O_WORKDIR/.tmp_$1 # Make it executable chmod a+x $PBS_O_WORKDIR/.tmp_$1 pbsdsh -h $1 /bin/bash -lc " $PBS_O_WORKDIR/.tmp_$1 " #Cleanup if needed (commented out for debugging) #rm $PBS_O_WORKD

[OMPI users] openmpi/pbsdsh/Torque problem

2011-04-02 Thread Laurence Marks
communicate. It does not have to be this it could be Task1: NodeA NodeB Task2: NodeC NodeD Here NodeC will start and it looks as if NodeD never starts anything. I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one Node (number of cores do not matter) it is fine. -- Laurence

Re: [OMPI users] openmpi fails to terminate for errors/signals on some but not all processes

2010-02-08 Thread Laurence Marks
Correction on a correction; I did not goof, however zombie's remaining is not a reproducible problem but can occur. On Mon, Feb 8, 2010 at 2:34 PM, Laurence Marks wrote: > I goofed, openmpi does trap these errors but the system I tested them > on had a very sluggish response. However,

Re: [OMPI users] openmpi fails to terminate for errors/signals on some but not all processes

2010-02-08 Thread Laurence Marks
I goofed, openmpi does trap these errors but the system I tested them on had a very sluggish response. However, and end-of-file is NOT trapped. On Mon, Feb 8, 2010 at 1:29 PM, Laurence Marks wrote: > This was "Re: [OMPI users] Trapping fortran I/O errors leaving zombie > mpi process

[OMPI users] openmpi fails to terminate for errors/signals on some but not all processes

2010-02-08 Thread Laurence Marks
as well as to send fortran or C signals to the process. Please note that the results can be dependent upon the level of optimization, and with other compilers there could be problems where the compiler complains about SIGSEV or other errors since the code deliberately tries to create these. --

Re: [OMPI users] Trapping fortran I/O errorsleavingzombiempiprocesses

2010-02-07 Thread Laurence Marks
"CNTRL/C" at the input leaving zombies! On Sat, Feb 6, 2010 at 9:24 PM, Laurence Marks wrote: > The following code reproduces the problem for mpif90/ifort > 11.1/openmpi-1.4.1. With an empty test.input (touch test.input) some > not reproducible number of zombies pr

Re: [OMPI users] Trapping fortran I/O errorsleavingzombiempiprocesses

2010-02-06 Thread Laurence Marks
e a format error for the read no zombies remain. -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.nu

Re: [OMPI users] Trapping fortran I/O errorsleavingzombiempiprocesses

2010-02-06 Thread Laurence Marks
wrote: > On Jan 29, 2010, at 9:13 AM, Laurence Marks wrote: > >> OK, but trivial codes don't always reproduce problems. > > Yes, but if the problem is a file reading beyond the end, that should be > fairly isolated behavior. > >> Is strace useful? > > Sure.

Re: [OMPI users] Parallel file write in fortran (+mpi)

2010-02-02 Thread Laurence Marks
of disallowed behaviour.  Not a good practice > to adopt in general. > > David > > On 02/03/2010 10:40 AM, Laurence Marks wrote: >> >> I know it's wrong, but I don't think it is forbidden (which I >> guess is what you are saying). >> >> On

Re: [OMPI users] Parallel file write in fortran (+mpi)

2010-02-02 Thread Laurence Marks
cations, 21, 132 (2007). >> >> They describe an implemenation of a "mutex" like object in MPI. If you >> protect writes to the file with an exclusive lock you can serialize the >> writes and make use of NFS's close to open cache coherence. >> >> nick

[OMPI users] Parallel file write in fortran (+mpi)

2010-02-02 Thread Laurence Marks
ut in some "official" document or similar. -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: ww

Re: [OMPI users] Trapping fortran I/O errors leavingzombiempiprocesses

2010-01-29 Thread Laurence Marks
OK, but trivial codes don't always reproduce problems. Is strace useful? On Fri, Jan 29, 2010 at 7:32 AM, Jeff Squyres wrote: > On Jan 29, 2010, at 8:23 AM, Laurence Marks wrote: > >> I'll try, but sometimes these things are hard to reproduce and I have >> to wait f

Re: [OMPI users] Trapping fortran I/O errors leaving zombiempiprocesses

2010-01-29 Thread Laurence Marks
On Fri, Jan 29, 2010 at 6:59 AM, Jeff Squyres wrote: > On Jan 28, 2010, at 2:23 PM, Laurence Marks wrote: > >> > If one process dies prematurely in Open MPI (i.e., before MPI_Finalize), >> > all the others > should be automatically killed. >> >> This doe

Re: [OMPI users] Trapping fortran I/O errors leaving zombie mpiprocesses

2010-01-28 Thread Laurence Marks
>On Jan 28, 2010, at 10:57 AM, Laurence Marks wrote: >> I am trying to find out if there is any way to create an error-handler >> or something else that will trap an error exit from the run-time >> library due to a fortran I/O error, or possibly some openmpi calls or >&g

[OMPI users] Trapping fortran I/O errors leaving zombie mpi processes

2010-01-28 Thread Laurence Marks
, and the other ones can hang because no termination/abort signal is sent to them – this seems to be implementation dependent. I have added some c/icc signal handlers and while these work and can be used to send an mpi_abort signal, none of them catch a fortran I/O error. -- Laurence Marks