Hi Brian, Thank you for your help. I have attached all the files you have asked for in a tar file.
Please find attached the 'config.log' and 'libmpi.la' for my Solaris installation. The output from 'mpicc -showme' is sunos$ mpicc -showme gcc -I/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/include -I/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS- 5.9/include/openmpi/ompi-L/home/cs/manredd/OpenMPI/openmpi- 1.0.1/OpenMPI-SunOS-5.9/lib -lmpi -lorte -lopal -lnsl -lsocket -lthread -laio -lm -lnsl -lsocket - lthread -ldl There are serious issues when running on just solaris machines. I am using the host file and app file shown below. Both the machines are SunOS and are similar. hosts.txt --------- csultra01 slots=1 csultra02 slots=1 mpiinit_appfile --------------- -np 1 /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/mpiinit_sunos -np 1 /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/mpiinit_sunos Running mpirun without -d option hangs. csultra01$ mpirun --hostfile hosts.txt --app mpiinit_appfile hangs Running mpirun with -d option dumps core with output in the file "mpirun_output_d_option.txt", which is attached. The core is also attached. Running just on only one host is also not working. The output from mpirun using "-d" option for this scenario is attached in file "mpirun_output_d_option_one_host.txt". I have also attached the list of packages installed on my solaris machine in "pkginfo.txt" I hope these will help you to resolve the issue. Regards, Ravi. > ----- Original Message ----- > From: Brian Barrett <brbar...@open-mpi.org> > Date: Friday, March 10, 2006 7:09 pm > Subject: Re: [OMPI users] problems with OpenMPI-1.0.1 on SunOS 5.9; > problems on heterogeneous cluster > To: Open MPI Users <us...@open-mpi.org> > > > On Mar 10, 2006, at 12:09 AM, Ravi Manumachu wrote: > > > > > I am facing problems running OpenMPI-1.0.1 on a heterogeneous > > cluster.> > > > I have a Linux machine and a SunOS machine in this cluster. > > > > > > linux$ uname -a > > > Linux pg1cluster01 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 > EDT > > 2004> i686 i686 i386 GNU/Linux > > > > > > sunos$ uname -a > > > SunOS csultra01 5.9 Generic_112233-10 sun4u sparc SUNW,Ultra-5_10 > > > > Unfortunately, this will not work with Open MPI at present. Open > > MPI > > 1.0.x does not have any support for running across platforms with > > > different endianness. Open MPI 1.1.x has much better support for > > > such situations, but is far from complete, as the MPI datatype > > engine > > does not properly fix up endian issues. We're working on the > > issue, > > but can not give a timetable for completion. > > > > Also note that (while not a problem here) Open MPI also does not > > support running in a mixed 32 bit / 64 bit environment. All > > processes must be 32 or 64 bit, but not a mix. > > > > > $ mpirun --hostfile hosts.txt --app mpiinit_appfile > > > ld.so.1: /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/ > > > mpiinit_sunos: > > > fatal: relocation error: file > > > /home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/lib/ > > > libmca_common_sm.so.0: > > > symbol nanosleep: referenced symbol not found > > > ld.so.1: /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/ > > > mpiinit_sunos: > > > fatal: relocation error: file > > > /home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/lib/ > > > libmca_common_sm.so.0: > > > symbol nanosleep: referenced symbol not found > > > > > > I have fixed this by compiling with "-lrt" option to the linker. > > > > You shouldn't have to do this... Could you send me the > config.log > > file configure for Open MPI, the installed $prefix/lib/libmpi.la > > file, and the output of mpicc -showme? > > > > > sunos$ mpicc -o mpiinit_sunos mpiinit.c -lrt > > > > > > However when I run this again, I get the error: > > > > > > $ mpirun --hostfile hosts.txt --app mpiinit_appfile > > > [pg1cluster01:19858] ERROR: A daemon on node csultra01 failed > to > > start> as expected. > > > [pg1cluster01:19858] ERROR: There may be more information > > available > > > from > > > [pg1cluster01:19858] ERROR: the remote shell (see above). > > > [pg1cluster01:19858] ERROR: The daemon exited unexpectedly with > > > > status 255. > > > 2 processes killed (possibly by Open MPI) > > > > Both of these are quite unexpected. It looks like there is > > something > > wrong with your Solaris build. Can you run on *just* the Solaris > > > machine? We only have limited resources for testing on Solaris, > > but > > have not run into this issue before. What happens if you run > > mpirun > > on just the Solaris machine with the -d option to mpirun? > > > > > Sometimes I get the error. > > > > > > $ mpirun --hostfile hosts.txt --app mpiinit_appfile > > > [csultra01:06256] mca_common_sm_mmap_init: ftruncate failed > with > > > errno=28 > > > [csultra01:06256] mca_mpool_sm_init: unable to create shared > > memory > > > mapping > > > ---------------------------------------------------------------- > -- > > ---- > > > ---- > > > It looks like MPI_INIT failed for some reason; your parallel > > > process is > > > likely to abort. There are many reasons that a parallel > process can > > > fail during MPI_INIT; some of which are due to configuration or > > > > environment > > > problems. This failure appears to be an internal failure; > here's > > some> additional information (which may only be relevant to an > Open > > MPI> developer): > > > > > > PML add procs failed > > > --> Returned value -2 instead of OMPI_SUCCESS > > > ---------------------------------------------------------------- > -- > > ---- > > > ---- > > > *** An error occurred in MPI_Init > > > *** before MPI was initialized > > > *** MPI_ERRORS_ARE_FATAL (goodbye) > > > > This looks like you got far enough along that you ran into our > > endianness issues, so this is about the best case you can hope > for > > in > > your configuration. The ftruncate error worries me, however. > But > > I > > think this is another symptom of something wrong with your Sun > > Sparc > > build. > > > > Brian > > > > -- > > Brian Barrett > > Open MPI developer > > http://www.open-mpi.org/ > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > >
OpenMPI-1.0.1-SunOS-5.9.tar.gz
Description: GNU Zip compressed data