After trying several kernel versions. The problem is narrowed down to the change from kernel 2.6.22 to 2.6.23. Finally I find the big change in process scheduler in 2.6.23: the Completely Fair Scheduler. http://kernelnewbies.org/Linux_2_6_23#head-f3a847a5aace97932f838027c93121321a6499e7
<http://kernelnewbies.org/Linux_2_6_23#head-f3a847a5aace97932f838027c93121321a6499e7>It says: Applications that depend *heavily* on sched_yield()'s behaviour (like, f.e., many benchmarks) can suffer from huge performance gains/losses due to the very very subtle semantics of what sched_yield() should do and how CFS changes them. There's a sysctl at /proc/sys/kernel/sched_compat_yield that you can set to "1" to change the sched_yield() behaviour that you should try in those cases. After setting /proc/sys/kernel/sched_compat_yield to "1", my hybrid application is live again. -- Huiwei Lv http://asg.ict.ac.cn/lhw/ On Tue, Oct 25, 2011 at 10:26 PM, Ralph Castain <r...@open-mpi.org> wrote: > My best guess is that you are seeing differences in scheduling behavior > with respect to memory locale. I notice that you are not binding your > processes, and so they are free to move around the various processors on the > node. I would guess that your thread is winding up on a processor that is > non-local to your memory in one case, but local to your memory in the other. > This is an OS-related scheduler decision. > > You might try binding your processes to see if it helps. With threads, you > don't really want to bind to a core, but binding to a socket should help. > Try adding --bind-to-socket to your mpirun cmd line (you can't do this if > you run it as a singleton - have to use mpirun). > > > On Oct 25, 2011, at 2:45 AM, 吕慧伟 wrote: > > Thanks, Ralph. Yes, I have taking that into account. The problem is not to > compare two proc with one proc, but the "multi-threading effect". > Multi-threading is good on the first machine for one and two proc, but on > the second machine, it disappears for two proc. > > To narrow down the problem, I reinstalled the operating system on the > second machine from SUSE 11(kernel 2.6.32.12, gcc 4.3.4) to Red Hat 5.4 > (kernel 2.6.18, gcc 4.1.2) which is similar to the first machine (Cent OS > 5.3, kernel 2.6.18, gcc 4.1.2). Then the problem disappears. So the problem > must lies somewhere in OS kernel or GCC version. Any suggestions? Thanks. > > -- > Huiwei Lv > > On Tue, Oct 25, 2011 at 3:11 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Okay - thanks for testing it. >> >> Of course, one obvious difference is that there isn't any communication >> when you run only one proc, but there is when you run two or more, assuming >> your application has MPI send/recv (or calls collective and other functions >> that communicate) calls in it. Communication to yourself is very fast as no >> bits actually move - sending messages to another proc is considerably >> slower. >> >> Are you taking that into account? >> >> >> On Oct 24, 2011, at 8:47 PM, 吕慧伟 wrote: >> >> No. There's a difference between "mpirun -np 1 ./my_hybrid_app..." >> and "mpirun -np 2 ./...". >> >> Run "mpirun -np 1 ./my_hybrid_app..." will increase the performance with >> more number of threads, but run "mpirun -np 2 ./..." decrease the >> performance. >> >> -- >> Huiwei Lv >> >> On Tue, Oct 25, 2011 at 12:00 AM, <users-requ...@open-mpi.org> wrote: >> >>> >>> Date: Mon, 24 Oct 2011 07:14:21 -0600 >>> From: Ralph Castain <r...@open-mpi.org> >>> Subject: Re: [OMPI users] Hybrid MPI/Pthreads program behaves >>> differently on two different machines with same hardware >>> To: Open MPI Users <us...@open-mpi.org> >>> Message-ID: <42c53d0b-1586-4001-b9d2-d77af0033...@open-mpi.org> >>> Content-Type: text/plain; charset="utf-8" >>> >>> Does the difference persist if you run the single process using mpirun? >>> In other words, does "mpirun -np 1 ./my_hybrid_app..." behave the same as >>> "mpirun -np 2 ./..."? >>> >>> There is a slight difference in the way procs start when run as >>> singletons. It shouldn't make a difference here, but worth testing. >>> >>> On Oct 24, 2011, at 12:37 AM, ??? wrote: >>> >>> > Dear List, >>> > >>> > I have a hybrid MPI/Pthreads program named "my_hybrid_app", this >>> program is memory-intensive and take advantage of multi-threading to improve >>> memory throughput. I run "my_hybrid_app" on two machines, which have same >>> hardware configuration but different OS and GCC. The problem is: when I run >>> "my_hybrid_app" with one process, two machines behaves the same: the more >>> number of threads, the better the performance; however, when I run >>> "my_hybrid_app" with two or more processes. The first machine still increase >>> performance with more threads, the second machine degrades in performance >>> with more threads. >>> > >>> > Since running "my_hybrid_app" with one process behaves correctly, I >>> suspect my linking to MPI library has some problem. Would somebody point me >>> in the right direction? Thanks in advance. >>> > >>> > Attached are the commandline used, my machine informantion and link >>> informantion. >>> > p.s. 1: Commandline >>> > single process: ./my_hybrid_app <number of threads> >>> > multiple process: mpirun -np 2 ./my_hybrid_app <number of threads> >>> > >>> > p.s. 2: Machine Informantion >>> > The first machine is CentOS 5.3 with GCC 4.1.2: >>> > Target: x86_64-redhat-linux >>> > Configured with: ../configure --prefix=/usr --mandir=/usr/share/man >>> --infodir=/usr/share/info --enable-shared --enable-threads=posix >>> --enable-checking=release --with-system-zlib --enable-__cxa_atexit >>> --disable-libunwind-exceptions --enable-libgcj-multifile >>> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk >>> --disable-dssi --enable-plugin >>> --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic >>> --host=x86_64-redhat-linux >>> > Thread model: posix >>> > gcc version 4.1.2 20080704 (Red Hat 4.1.2-44) >>> > The second machine is SUSE Enterprise Server 11 with GCC 4.3.4: >>> > Target: x86_64-suse-linux >>> > Configured with: ../configure --prefix=/usr --infodir=/usr/share/info >>> --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 >>> --enable-languages=c,c++,objc,fortran,obj-c++,java,ada >>> --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.3 >>> --enable-ssp --disable-libssp >>> --with-bugurl=http://bugs.opensuse.org/--with-pkgversion='SUSE Linux' >>> --disable-libgcj --disable-libmudflap >>> --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit >>> --enable-libstdcxx-allocator=new --disable-libstdcxx-pch >>> --enable-version-specific-runtime-libs --program-suffix=-4.3 >>> --enable-linux-futex --without-system-libunwind --with-cpu=generic >>> --build=x86_64-suse-linux >>> > Thread model: posix >>> > gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) >>> > >>> > p.s. 3: ldd Informantion >>> > The first machine: >>> > $ ldd my_hybrid_app >>> > libm.so.6 => /lib64/libm.so.6 (0x000000358d400000) >>> > libmpi.so.0 => /usr/local/openmpi/lib/libmpi.so.0 >>> (0x00002af0d53a7000) >>> > libopen-rte.so.0 => /usr/local/openmpi/lib/libopen-rte.so.0 >>> (0x00002af0d564a000) >>> > libopen-pal.so.0 => /usr/local/openmpi/lib/libopen-pal.so.0 >>> (0x00002af0d5895000) >>> > libdl.so.2 => /lib64/libdl.so.2 (0x000000358d000000) >>> > libnsl.so.1 => /lib64/libnsl.so.1 (0x000000358f000000) >>> > libutil.so.1 => /lib64/libutil.so.1 (0x000000359a600000) >>> > libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00002af0d5b07000) >>> > libpthread.so.0 => /lib64/libpthread.so.0 (0x000000358d800000) >>> > libc.so.6 => /lib64/libc.so.6 (0x000000358cc00000) >>> > /lib64/ld-linux-x86-64.so.2 (0x000000358c800000) >>> > librt.so.1 => /lib64/librt.so.1 (0x000000358dc00000) >>> > The second machine: >>> > $ ldd my_hybrid_app >>> > linux-vdso.so.1 => (0x00007fff3eb5f000) >>> > libmpi.so.0 => /root/opt/openmpi/lib/libmpi.so.0 >>> (0x00007f68627a1000) >>> > libm.so.6 => /lib64/libm.so.6 (0x00007f686254b000) >>> > libopen-rte.so.0 => /root/opt/openmpi/lib/libopen-rte.so.0 >>> (0x00007f68622fc000) >>> > libopen-pal.so.0 => /root/opt/openmpi/lib/libopen-pal.so.0 >>> (0x00007f68620a5000) >>> > libdl.so.2 => /lib64/libdl.so.2 (0x00007f6861ea1000) >>> > libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f6861c89000) >>> > libutil.so.1 => /lib64/libutil.so.1 (0x00007f6861a86000) >>> > libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f686187d000) >>> > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6861660000) >>> > libc.so.6 => /lib64/libc.so.6 (0x00007f6861302000) >>> > /lib64/ld-linux-x86-64.so.2 (0x00007f6862a58000) >>> > librt.so.1 => /lib64/librt.so.1 (0x00007f68610f9000) >>> > I installed openmpi-1.4.2 to a user directory /root/opt/openmpi and use >>> "-L/root/opt/openmpi -Wl,-rpath,/root/opt/openmpi" when linking. >>> > -- >>> > Huiwei Lv >>> > PhD. student at Institute of Computing Technology, >>> > Beijing, China >>> > http://asg.ict.ac.cn/lhw >> >>