[OMPI users] question running on heterogeneous systems
Dear OpenMPI Users, I have two systems, one with Intel64 processor, and one with IA32. The OSs on first is CentOS-86_64 and the other CentOS-i386. I installed Intel fortran compiler 10.1 on both. In the first I use the fce, and in the second I use fc directories (ifortvars.sh/csh). I have compiled openmpi separately on each machine. Now, I could not run my application whch is compiled on ia32 machine. Should I use "fc" instead of "fce" on intel64 and then compile openmpi with that? Best regards, Mahmoud Payami PS: I have read the following FAQ but I need specific answer. As of v1.1, Open MPI requires that the size of C, C++, and Fortran datatypes be the same on all platforms within a single parallel application with the exception of types represented by MPI_BOOL and MPI_LOGICAL -- size differences in these types between processes are properly handled. Endian differences between processes in a single MPI job are properly and automatically handled. Prior to v1.1, Open MPI did not include any support for data size or endian heterogeneity.
Re: [OMPI users] question running on heterogeneous systems
On Fri, Jan 2, 2009 at 9:08 AM, doriankrause wrote: > Mahmoud Payami wrote: > >> >> Dear OpenMPI Users, >> >> I have two systems, one with Intel64 processor, and one with IA32. The OSs >> on first is CentOS-86_64 and the other CentOS-i386. I installed Intel >> fortran compiler 10.1 on both. In the first I use the fce, and in the >> second I use fc directories (ifortvars.sh/csh). I have compiled openmpi >> separately on each machine. Now, I could not run my application whch is >> compiled on ia32 machine. Should I use "fc" instead of "fce" on intel64 and >> then compile openmpi with that? >> >> > Could you give us some more information? What is the error message? > You said that the application is compiled for the 32 bit architecture. I'm > not used to mixing 32/64 bit architectures. Does the application run on each > host seperately? > > Dorian > > > Dear Dorian, Thank you for your contribution. The application, compiled on each box separately, is ok with mpi an no problem. Recently, I had checked that a binary file created on ia32, also works on 86_64 but the reverse is not true. So, why not a parallel program which is compiled on ia32 box? I think, if I configure and install openmpi using ia32 intel compiler on 86_64 box, then it will be resolved. I have to check it and will report the result. In present case, it is searching for shared lib.so.0 which has some extension "..ELF...64". I have already added "/usr/local/lib" which contains mpi libs in LD_LIBRARY_PATH otherwise they would not work on each box even separatey. Bests, Happy 2009 mahmoud
Re: [OMPI users] question running on heterogeneous systems
Dear Gus, Thank you for the detailed explanation. It is quite helpful. I think now I have got how to manage the problem. Best regards, Mahmoud Payami Theoretical Physics Group, Atomic Energy Organization of Iran Tehran-Iran mpay...@aeoi.org.ir On Mon, Jan 5, 2009 at 12:21 PM, Gus Correa wrote: > Mahmoud Payami wrote: > >> >> >> On Fri, Jan 2, 2009 at 9:08 AM, doriankrause > doriankra...@web.de>> wrote: >> >>Mahmoud Payami wrote: >> >> >>Dear OpenMPI Users, >> >>I have two systems, one with Intel64 processor, and one with >>IA32. The OSs on first is CentOS-86_64 and the other >>CentOS-i386. I installed Intel fortran compiler 10.1 on both. >> In the first I use the fce, and in the second I use fc >>directories (ifortvars.sh/csh). I have compiled openmpi >>separately on each machine. Now, I could not run my >>application whch is compiled on ia32 machine. Should I use >>"fc" instead of "fce" on intel64 and then compile openmpi with >>that? >> >> >>Could you give us some more information? What is the error message? >>You said that the application is compiled for the 32 bit >>architecture. I'm not used to mixing 32/64 bit architectures. Does >>the application run on each host seperately? >> >>Dorian >> >> >> >> Hi Mahmoud, list > >> Dear Dorian, >> Thank you for your contribution. The application, compiled on each box >> separately, is ok with mpi an no problem. Recently, I had checked that a >> binary file created on ia32, also works on 86_64 but the reverse is not >> true. >> > That is correct. > x86-64 architecture can run 32-bit binaries, > but 64-bit binaries don't work on x86 machines. > >> So, why not a parallel program which is compiled on ia32 box? I think, if >> I configure and install openmpi using ia32 intel compiler on 86_64 box, then >> it will be resolved. >> > 1. You need to compile OpenMPI separately on each architecture. > Use the "--prefix=/path/to/my/openmpi/32bit/" (32-bit example/suggestion) > configure option, to install the two libraries on different locations, > if you want. This will make clear for which architecture the library was > built for. > > 2. You need to compile your application separately on each architecture, > and link to the OpenMPI libraries built for that specific architecture > according to item 1 above. > (I.e. don't mix apples and oranges.) > > 3. You need to have the correct environment variables set > on each machine architecture. > They are *different* on each architecture. > > I.e., if you use Intel Fortran, > source the fc script on the 32bit machine, > and source the fce script on the 64-bit machine. > > This can be done on the .bashrc or .tcshrc file. > If you have a different home directory on each machine, > you can write a .bashrc or .tcshrc file for each architecture. > If you have a single NFS mounted home directory, > use a trick like this (tcsh example): > > if ( $HOST == "my_32bit_hostname" ) then > source /path/to/intel/fc/bin/ifortvars.csh # Note "fc" here. > else if ( $HOST == "my_64bit_hostname" ) then > source /path/to/intel/fce/bin/ifortvars.csh # Note "fce" here. > endif > > whatever your "my_32bit_hostname", "my_64bit_hostname". > /path/to/intel/fc/, and /path/to/intel/fce/ are. > (Do "hostname" on each machine to find out the right name to use.) > > Likewise for the OpenMPI binaries (mpicc, mpif90, mpirun, etc): > > if ( $HOST == "my_32bit_hostname" ) then > setenv PATH /path/to/my/openmpi/32bit/bin:$PATH # Note "32bit" here. > else if ( $HOST == "my_64bit_hostname" ) then > setenv PATH /path/to/my/openmpi/64bit/bin:$PATH# Note "64bit" here. > endif > > This approach also works for separate home directories "per machine" > (not NFS mounted), and is probably the simplest way to solve the problem. > > There are more elegant ways to setup the environment of choice, > other than changing the user startup files. > For instance, you can write intel.csh and intel.sh on the /etc/profile.d > directory, > to setup the appropriate environment as the user logs in. > See also the "environment modules" package: > http://modules.sourceforge.net/ > > 4) If you run MPI programs across the two machines/architectures, > make sure to use the MPI types on MPI function calls correctly, > and to match them
[OMPI users] Threading fault(?)
Dear All, I have installed openmpi-1.3.1 (with the defaults), and built my application. The linux box is 2Xamd64 quad. In the middle of running of my application, I receive the message and stops. I tried to configure openmpi using "--disable-mpi-threads" but it automatically assumes "posix". This problem does not happen in openmpi-1.2.9. Any comment is highly appreciated. Best regards, mahmoud payami [hpc1:25353] *** Process received signal *** [hpc1:25353] Signal: Segmentation fault (11) [hpc1:25353] Signal code: Address not mapped (1) [hpc1:25353] Failing at address: 0x51 [hpc1:25353] [ 0] /lib64/libpthread.so.0 [0x303be0dd40] [hpc1:25353] [ 1] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so [0x2e350d96] [hpc1:25353] [ 2] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so [0x2e3514a8] [hpc1:25353] [ 3] /opt/openmpi131_cc/lib/openmpi/mca_btl_sm.so [0x2eb7c72a] [hpc1:25353] [ 4] /opt/openmpi131_cc/lib/libopen-pal.so.0(opal_progress+0x89) [0x2b42b7d9] [hpc1:25353] [ 5] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so [0x2e34d27c] [hpc1:25353] [ 6] /opt/openmpi131_cc/lib/libmpi.so.0(PMPI_Recv+0x210) [0x2af46010] [hpc1:25353] [ 7] /opt/openmpi131_cc/lib/libmpi_f77.so.0(mpi_recv+0xa4) [0x2acd6af4] [hpc1:25353] [ 8] /opt/QE131_cc/bin/pw.x(parallel_toolkit_mp_zsqmred_+0x13da) [0x513d8a] [hpc1:25353] [ 9] /opt/QE131_cc/bin/pw.x(pcegterg_+0x6c3f) [0x6667ff] [hpc1:25353] [10] /opt/QE131_cc/bin/pw.x(diag_bands_+0xb9e) [0x65654e] [hpc1:25353] [11] /opt/QE131_cc/bin/pw.x(c_bands_+0x277) [0x6575a7] [hpc1:25353] [12] /opt/QE131_cc/bin/pw.x(electrons_+0x53f) [0x58a54f] [hpc1:25353] [13] /opt/QE131_cc/bin/pw.x(MAIN__+0x1fb) [0x458acb] [hpc1:25353] [14] /opt/QE131_cc/bin/pw.x(main+0x3c) [0x4588bc] [hpc1:25353] [15] /lib64/libc.so.6(__libc_start_main+0xf4) [0x303b21d8a4] [hpc1:25353] [16] /opt/QE131_cc/bin/pw.x(realloc+0x1b9) [0x4587e9] [hpc1:25353] *** End of error message *** -- mpirun noticed that process rank 6 with PID 25353 on node hpc1 exited on signal 11 (Segmentation fault). --
[OMPI users] Threading fault
Dear All, I am using intel lc_prof-11 (and its own mkl) and have built openmpi-1.3.1 with connfigure options: "FC=ifort F77=ifort CC=icc CXX=icpc". Then I have built my application. The linux box is 2Xamd64 quad. In the middle of running of my application (after some 15 iterations), I receive the message and stops. I tried to configure openmpi using "--disable-mpi-threads" but it automatically assumes "posix". This problem does not happen in openmpi-1.2.9. Any comment is highly appreciated. Best regards, mahmoud payami [hpc1:25353] *** Process received signal *** [hpc1:25353] Signal: Segmentation fault (11) [hpc1:25353] Signal code: Address not mapped (1) [hpc1:25353] Failing at address: 0x51 [hpc1:25353] [ 0] /lib64/libpthread.so.0 [0x303be0dd40] [hpc1:25353] [ 1] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so [0x2e350d96] [hpc1:25353] [ 2] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so [0x2e3514a8] [hpc1:25353] [ 3] /opt/openmpi131_cc/lib/openmpi/mca_btl_sm.so [0x2eb7c72a] [hpc1:25353] [ 4] /opt/openmpi131_cc/lib/libopen-pal.so.0(opal_progress+0x89) [0x2b42b7d9] [hpc1:25353] [ 5] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so [0x2e34d27c] [hpc1:25353] [ 6] /opt/openmpi131_cc/lib/libmpi.so.0(PMPI_Recv+0x210) [0x2af46010] [hpc1:25353] [ 7] /opt/openmpi131_cc/lib/libmpi_f77.so.0(mpi_recv+0xa4) [0x2acd6af4] [hpc1:25353] [ 8] /opt/QE131_cc/bin/pw.x(parallel_toolkit_mp_zsqmred_+0x13da) [0x513d8a] [hpc1:25353] [ 9] /opt/QE131_cc/bin/pw.x(pcegterg_+0x6c3f) [0x6667ff] [hpc1:25353] [10] /opt/QE131_cc/bin/pw.x(diag_bands_+0xb9e) [0x65654e] [hpc1:25353] [11] /opt/QE131_cc/bin/pw.x(c_bands_+0x277) [0x6575a7] [hpc1:25353] [12] /opt/QE131_cc/bin/pw.x(electrons_+0x53f) [0x58a54f] [hpc1:25353] [13] /opt/QE131_cc/bin/pw.x(MAIN__+0x1fb) [0x458acb] [hpc1:25353] [14] /opt/QE131_cc/bin/pw.x(main+0x3c) [0x4588bc] [hpc1:25353] [15] /lib64/libc.so.6(__libc_start_main+0xf4) [0x303b21d8a4] [hpc1:25353] [16] /opt/QE131_cc/bin/pw.x(realloc+0x1b9) [0x4587e9] [hpc1:25353] *** End of error message *** -- mpirun noticed that process rank 6 with PID 25353 on node hpc1 exited on signal 11 (Segmentation fault). --
[OMPI users] threading bug?
Dear All, I am using intel lc_prof-11 (and its own mkl) and have built openmpi-1.3.1 with connfigure options: "FC=ifort F77=ifort CC=icc CXX=icpc". Then I have built my application. The linux box is 2Xamd64 quad. In the middle of running of my application (after some 15 iterations), I receive the message and stops. I tried to configure openmpi using "--disable-mpi-threads" but it automatically assumes "posix". This problem does not happen in openmpi-1.2.9. Any comment is highly appreciated. Best regards, mahmoud payami [hpc1:25353] *** Process received signal *** [hpc1:25353] Signal: Segmentation fault (11) [hpc1:25353] Signal code: Address not mapped (1) [hpc1:25353] Failing at address: 0x51 [hpc1:25353] [ 0] /lib64/libpthread.so.0 [0x303be0dd40] [hpc1:25353] [ 1] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so [0x2e350d96] [hpc1:25353] [ 2] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so [0x2e3514a8] [hpc1:25353] [ 3] /opt/openmpi131_cc/lib/openmpi/mca_btl_sm.so [0x2eb7c72a] [hpc1:25353] [ 4] /opt/openmpi131_cc/lib/libopen-pal.so.0(opal_progress+0x89) [0x2b42b7d9] [hpc1:25353] [ 5] /opt/openmpi131_cc/lib/openmpi/mca_pml_ob1.so [0x2e34d27c] [hpc1:25353] [ 6] /opt/openmpi131_cc/lib/libmpi.so.0(PMPI_Recv+0x210) [0x2af46010] [hpc1:25353] [ 7] /opt/openmpi131_cc/lib/libmpi_f77.so.0(mpi_recv+0xa4) [0x2acd6af4] [hpc1:25353] [ 8] /opt/QE131_cc/bin/pw.x(parallel_toolkit_mp_zsqmred_+0x13da) [0x513d8a] [hpc1:25353] [ 9] /opt/QE131_cc/bin/pw.x(pcegterg_+0x6c3f) [0x6667ff] [hpc1:25353] [10] /opt/QE131_cc/bin/pw.x(diag_bands_+0xb9e) [0x65654e] [hpc1:25353] [11] /opt/QE131_cc/bin/pw.x(c_bands_+0x277) [0x6575a7] [hpc1:25353] [12] /opt/QE131_cc/bin/pw.x(electrons_+0x53f) [0x58a54f] [hpc1:25353] [13] /opt/QE131_cc/bin/pw.x(MAIN__+0x1fb) [0x458acb] [hpc1:25353] [14] /opt/QE131_cc/bin/pw.x(main+0x3c) [0x4588bc] [hpc1:25353] [15] /lib64/libc.so.6(__libc_start_main+0xf4) [0x303b21d8a4] [hpc1:25353] [16] /opt/QE131_cc/bin/pw.x(realloc+0x1b9) [0x4587e9] [hpc1:25353] *** End of error message *** -- mpirun noticed that process rank 6 with PID 25353 on node hpc1 exited on signal 11 (Segmentation fault). --