[OMPI users] strange problem with OpenMPI + rankfile + Intel compiler 11.0.074 + centos/fedora-12

2010-03-24 Thread Anton Starikov
Intel compiler 11.0.074 OpenMPI 1.4.1 Two different OSes: centos 5.4 (2.6.18 kernel) and Fedora-12 (2.6.32 kernel) Two different CPUs: Opteron 248 and Opteron 8356. same binary for OpenMPI. Same binary for user code (vasp compiled for older arch) When I supply rankfile, then depending on combo

[OMPI users] Torque+BCLR+OpenMPI

2010-02-11 Thread Anton Starikov
Hi! I'm trying to implement checkpointing on out cluster, and I have obvious question. I guess this was implemented many times by other users, so I would like is someone share experience with me. With serial/multithreaded jobs it is kind of clear. But for parallel? We have "fat" 16-core nodes,

Re: [OMPI users] strange bug

2009-05-12 Thread Anton Starikov
I will try to prepare test-case. -- Anton Starikov. On May 12, 2009, at 6:57 PM, Edgar Gabriel wrote: hm, so I am out of ideas. I created multiple variants of test- programs which did what you basically described, and they all passed and did not generate problems. I compiled the MUMPS

Re: [OMPI users] strange bug

2009-05-12 Thread Anton Starikov
fo.txt.gz Description: GNU Zip compressed data -- Anton Starikov. Computational Material Science, Faculty of Science and Technology, University of Twente. Phone: +31 (0)53 489 2986 Fax: +31 (0)53 489 2910 On May 12, 2009, at 12:35 PM, Jeff Squyres wrote: Can you send all the information listed her

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-12 Thread Anton Starikov
" and "mpirun -np 5" both works, but in both cases there are only 4 tasks. It isn't crucial, because there is nor real oversubscription, but there is still some bug which can affect something in future. -- Anton Starikov. On May 12, 2009, at 1:45 AM, Ralph Castain wro

Re: [OMPI users] strange bug

2009-05-11 Thread Anton Starikov
By the way, this if fortran code, which uses F77 bindings. -- Anton Starikov. On May 12, 2009, at 3:06 AM, Anton Starikov wrote: Due to rankfile fixes I switched to SVN r21208, now my code dies with error [node037:20519] *** An error occurred in MPI_Comm_dup [node037:20519] *** on

[OMPI users] strange bug

2009-05-11 Thread Anton Starikov
(your MPI job will now abort) -- Anton Starikov.

[OMPI users] strabge bug

2009-05-11 Thread Anton Starikov
(your MPI job will now abort) -- Anton Starikov.

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-05-11 Thread Anton Starikov
Although removing this check solves problem of having more slots in rankfile than necessary, there is another problem. If I set rmaps_base_no_oversubscribe=1 then if, for example: hostfile: node01 node01 node02 node02 rankfile: rank 0=node01 slot=1 rank 1=node01 slot=0 rank 2=node02 slot=1

[OMPI users] running problem on Dell blade server, confirm 2d21ce3ce8be64d8104b3ad71b8c59e2514a72eb

2009-04-25 Thread Anton Starikov
I can confirm that I have exactly the same problem, also on Dell system, even with latest openpmpi. Our system is: Dell M905 OpenSUSE 11.1 kernel: 2.6.27.21-0.1-default ofed-1.4-21.12 from SUSE repositories. OpenMPI-1.3.2 But what I can also add, it not only affect openmpi, if this messages