I can't tell if these problems are related to trac ticket 2043 or not.

Compiler: In my experience, trac 2043 depends on GCC 4.4.x. It isn't necessarily a GCC bug... perhaps it's just exposing an OMPI problem. I'm confused what compiler Jiaye is using, and Vasilis is apparently seeing a problem when using the PGI compiler. But, maybe other compilers in addition to GCC 4.4.x are exposing the problem.

Severity: In my experience, trac 2043 shows up rather dramatically: within dozens to hundreds of iterations of simple message patterns. So, a problem that shows up only after hours of execution feels to me to be something different. But maybe I misunderstand Jiaye's and Vasili's cases: are the programs running well for several hours before the hang occurs?

Shared memory: Trac 2043 appears related to shared memory. Jiaye seems to run on a single node. Vasilis talks of running on a "cluster" -- so I don't know if that means over an interconnect or still using sm.

Anyhow, it's hard to know which problems are the same or different when we don't yet really understand what's going on.

vasilis gkanis wrote:

I also experience a similar problem with the MUMPS solver, when I run it on a cluster. After several hours of running the code does not produce any results, although the command top shows that the program occupies 100% of the CPU.

The difference here, however, is that the same program runs fine on my PC. The differences between my PC and the cluster are:
1) 32bit vs 64-bit(cluster)
2) intel compiler vs portland compiler(cluster)

On Friday 20 November 2009 03:50:17 am Jiaye Li wrote:
I installed openmpi-1.3.3 on my single node(cpu) intel 64bit quad-core
machine. The compiler info is:

***************************************************************************
*********************************** intel-icc101018-10.1.018-1.i386
libgcc-4.4.0-4.i586
gcc-4.4.0-4.i586
gcc-gfortran-4.4.0-4.i586
gcc-c++-4.4.0-4.i586
intel-ifort101018-10.1.018-1.i386

***************************************************************************
***********************************

I compiled PWscf program with openmpi and tested the program. At the
beginning, the execution of  PW went on well, but after about 10 h, when
the program is going to finish. The program hang there, but the cpu time
is still occupied. (100% taken up by the program). There seems to be
something wrong, somewhere. Any ideas? Thank you in advance.

Reply via email to