Just to help separate out the issues, you might try running the hello_c program in the OMPI examples directory - this will verify whether the problem is in the mpirun command or in your program
On Sep 4, 2014, at 6:26 AM, Donato Pera <donato.p...@dm.univaq.it> wrote: > Hi, > > the text was on the file.err file in the file.out file I get only the name > of the node where the program run. > > Thanks Donato. > > > On 04/09/2014 15:14, Reuti wrote: >> Hi, >> >> Am 04.09.2014 um 14:43 schrieb Donato Pera: >> >>> using this script : >>> >>> #!/bin/bash >>> #$ -S /bin/bash >>> #$ -pe orte 64 >>> #$ -cwd >>> #$ -o ./file.out >>> #$ -e ./file.err >>> >>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH >>> export OMP_NUM_THREADS=1 >>> >>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/ >>> PP_PATH=/home/tanzi >>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun ${CPMD_PATH}cpmd.x input >>> ${PP_PATH}/PP/ > out >> Is this text below in out, file.out or file.err - any hint in the other >> files? >> >> -- Reuti >> >> >>> The program run for about 2 minutes and after I get this error >>> >>> WARNING: A process refused to die! >>> >>> Host: compute-2-2.local >>> PID: 24897 >>> >>> This process may still be running and/or consuming resources. >>> >>> -------------------------------------------------------------------------- >>> [compute-2-2.local:24889] 25 more processes have sent help message >>> help-odls-default.txt / odls-default:could-not-kill >>> [compute-2-2.local:24889] Set MCA parameter "orte_base_help_aggregate" >>> to 0 to see all help / error messages >>> [compute-2-2.local:24889] 27 more processes have sent help message >>> help-odls-default.txt / odls-default:could-not-kill >>> -------------------------------------------------------------------------- >>> mpirun has exited due to process rank 0 with PID 24896 on >>> node compute-2-2.local exiting improperly. There are two reasons this >>> could occur: >>> >>> 1. this process did not call "init" before exiting, but others in >>> the job did. This can cause a job to hang indefinitely while it waits >>> for all processes to call "init". By rule, if one process calls "init", >>> then ALL processes must call "init" prior to termination. >>> >>> 2. this process called "init", but exited without calling "finalize". >>> By rule, all processes that call "init" MUST call "finalize" prior to >>> exiting or it will be considered an "abnormal termination" >>> >>> This may have caused other processes in the application to be >>> terminated by signals sent by mpirun (as reported here). >>> -------------------------------------------------------------------------- >>> [compute-2-2.local:24889] 1 more process has sent help message >>> help-odls-default.txt / odls-default:could-not-kill >>> >>> >>> Thanks and Regards Donato >>> >>> >>> >>> >>> On 03/09/2014 13:19, Reuti wrote: >>>> Am 03.09.2014 um 13:11 schrieb Donato Pera: >>>> >>>>> I get >>>>> >>>>> ompi_info | grep grid >>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5) >>>> Good. >>>> >>>> >>>>> and using this script >>>>> >>>>> #!/bin/bash >>>>> #$ -S /bin/bash >>>>> #$ -pe orte 64 >>>>> #$ -cwd >>>>> #$ -o ./file.out >>>>> #$ -e ./file.err >>>>> >>>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH >>>>> export OMP_NUM_THREADS=1 >>>>> >>>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/ >>>>> PP_PATH=/home/tanzi >>>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64 >>>>> -machinefile $TMPDIR/machines ${CPMD_PATH}cpmd.x input ${PP_PATH}/PP/ >>>> In the PE "orte" is no "start_proc_args" defined which could generate the >>>> machinefile. Please try to start the application with: >>>> >>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib ${CPMD_PATH}cpmd.x >>>> input ${PP_PATH}/PP/ >>>> >>>> -- Reuti >>>> >>>> >>>>>> out >>>>> I get this error >>>>> >>>>> Open RTE was unable to open the hostfile: >>>>> /tmp/21213.1.debug.q/machines >>>>> Check to make sure the path and filename are correct. >>>>> -------------------------------------------------------------------------- >>>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>>>> base/rmaps_base_support_fns.c at line 207 >>>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>>>> rmaps_rr.c at line 82 >>>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>>>> base/rmaps_base_map_job.c at line 88 >>>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>>>> base/plm_base_launch_support.c at line 105 >>>>> [compute-2-6.local:22452] [[5218,0],0] ORTE_ERROR_LOG: Not found in file >>>>> plm_rsh_module.c at line 1173 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Instead using this script >>>>> >>>>> >>>>> #!/bin/bash >>>>> #$ -S /bin/bash >>>>> #$ -pe orte 64 >>>>> #$ -cwd >>>>> #$ -o ./file.out >>>>> #$ -e ./file.err >>>>> >>>>> export LD_LIBRARY_PATH=/home/SWcbbc/openmpi-1.6.5/lib:$LD_LIBRARY_PATH >>>>> export OMP_NUM_THREADS=1 >>>>> >>>>> CPMD_PATH=/home/tanzi/myroot/X86_66intel-mpi/ >>>>> PP_PATH=/home/tanzi >>>>> /home/SWcbbc/openmpi-1.6.5/bin/mpirun -mca btl openib -np 64 >>>>> $TMPDIR/machines ${CPMD_PATH}cpmd.x input ${PP_PATH}/PP/ > out >>>>> >>>>> >>>>> I get >>>>> Executable: /tmp/21214.1.debug.q/machines >>>>> Node: compute-2-0.local >>>>> >>>>> while attempting to start process rank 0. >>>>> -------------------------------------------------------------------------- >>>>> >>>>> can you help me >>>>> >>>>> >>>>> Thanks and Regards Donato >>>>> >>>>> >>>>> >>>>> >>>>> On 03/09/2014 12:28, Reuti wrote: >>>>>> ompi_info | grep grid >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/09/25240.php >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/09/25242.php >>>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/09/25265.php >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/09/25266.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25267.php