^^^
>
> Also, if you know that #procs <= #cores on your nodes, you can greatly
> improve performance by adding "--bind-to-core".
>
>
>
> On Apr 3, 2011, at 5:28 PM, Laurence Marks wrote:
>
>> And, before someone wonders, while Wien2k is a co
, Laurence Marks wrote:
> Thanks. I will test this tomorrow.
>
> Many people run Wien2k with openmpi as you say, I only became aware of
> the issue of Wien2k (and perhaps other codes) leaving orphaned
> processes still running a few days ago. I also know someone who wants
>
banned.
Personally, as I don't want to be banned from the supercomputers I use
I want to find a adequate patch for myself --- and then try and
persuade the developers to adopt it.
On Sun, Apr 3, 2011 at 6:13 PM, Ralph Castain wrote:
>
> On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote:
>
:07 100 ./pp143
>> 10159 R 551436 839184 2.2 00:00:06 98.3 ./pp143
>> 10160 R 551760 839692 2.2 00:00:07 100 ./pp143
>> 10161 R 551788 839824 2.2 00:00:07 97.3 ./pp143
>> 10162 R 552256 840332 2.2 00:00:07 100 ./pp143
>> 10163 R 552216 840340 2.2 00:00:07 99.3 ./pp143
>&g
mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
--
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
em
On Sun, Apr 3, 2011 at 11:41 AM, Ralph Castain wrote:
>
> On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote:
>
>> On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote:
>>>
>>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>>>
>>>> Let me e
On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote:
>
> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>
>> Let me expand on this slightly (in response to Ralph Castain's posting
>> -- I had digest mode set). As currently constructed a shellscript in
>> W
>> $PBS_O_WORKDIR/.tmp_$1
# Now the command we want to run
echo $2 >> $PBS_O_WORKDIR/.tmp_$1
# Make it executable
chmod a+x $PBS_O_WORKDIR/.tmp_$1
pbsdsh -h $1 /bin/bash -lc " $PBS_O_WORKDIR/.tmp_$1 "
#Cleanup if needed (commented out for debugging)
#rm $PBS_O_WORKD
communicate. It does not have to be this it could be
Task1: NodeA NodeB
Task2: NodeC NodeD
Here NodeC will start and it looks as if NodeD never starts anything.
I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one
Node (number of cores do not matter) it is fine.
--
Laurence
Correction on a correction; I did not goof, however zombie's remaining
is not a reproducible problem but can occur.
On Mon, Feb 8, 2010 at 2:34 PM, Laurence Marks wrote:
> I goofed, openmpi does trap these errors but the system I tested them
> on had a very sluggish response. However,
I goofed, openmpi does trap these errors but the system I tested them
on had a very sluggish response. However, and end-of-file is NOT
trapped.
On Mon, Feb 8, 2010 at 1:29 PM, Laurence Marks wrote:
> This was "Re: [OMPI users] Trapping fortran I/O errors leaving zombie
> mpi process
as well as to send fortran or C
signals to the process. Please note that the results can be dependent
upon the level of optimization, and with other compilers there could
be problems where the compiler complains about SIGSEV or other errors
since the code deliberately tries to create these.
--
"CNTRL/C" at the input leaving zombies!
On Sat, Feb 6, 2010 at 9:24 PM, Laurence Marks wrote:
> The following code reproduces the problem for mpif90/ifort
> 11.1/openmpi-1.4.1. With an empty test.input (touch test.input) some
> not reproducible number of zombies pr
e a format error for the read no zombies remain.
--
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.nu
wrote:
> On Jan 29, 2010, at 9:13 AM, Laurence Marks wrote:
>
>> OK, but trivial codes don't always reproduce problems.
>
> Yes, but if the problem is a file reading beyond the end, that should be
> fairly isolated behavior.
>
>> Is strace useful?
>
> Sure.
of disallowed behaviour. Not a good practice
> to adopt in general.
>
> David
>
> On 02/03/2010 10:40 AM, Laurence Marks wrote:
>>
>> I know it's wrong, but I don't think it is forbidden (which I
>> guess is what you are saying).
>>
>> On
cations, 21, 132 (2007).
>>
>> They describe an implemenation of a "mutex" like object in MPI. If you
>> protect writes to the file with an exclusive lock you can serialize the
>> writes and make use of NFS's close to open cache coherence.
>>
>> nick
ut in some "official"
document or similar.
--
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: ww
OK, but trivial codes don't always reproduce problems.
Is strace useful?
On Fri, Jan 29, 2010 at 7:32 AM, Jeff Squyres wrote:
> On Jan 29, 2010, at 8:23 AM, Laurence Marks wrote:
>
>> I'll try, but sometimes these things are hard to reproduce and I have
>> to wait f
On Fri, Jan 29, 2010 at 6:59 AM, Jeff Squyres wrote:
> On Jan 28, 2010, at 2:23 PM, Laurence Marks wrote:
>
>> > If one process dies prematurely in Open MPI (i.e., before MPI_Finalize),
>> > all the others > should be automatically killed.
>>
>> This doe
>On Jan 28, 2010, at 10:57 AM, Laurence Marks wrote:
>> I am trying to find out if there is any way to create an error-handler
>> or something else that will trap an error exit from the run-time
>> library due to a fortran I/O error, or possibly some openmpi calls or
>&g
, and the other ones can hang because no
termination/abort signal is sent to them – this seems to be
implementation dependent.
I have added some c/icc signal handlers and while these work and can
be used to send an mpi_abort signal, none of them catch a fortran I/O
error.
--
Laurence Marks
22 matches
Mail list logo