Hi Team,

I am compiled the OpenFoam-1.7.1,openFoam-2.2.1,OpenFoam-2.2.2 versions.
All the versions same problem that some times I am able to run simpleFoam
8,16,32,64,80 cores but some times it will get hang no messages will come.
My observervation is that when I am running successfully the orted process
in the node will start as single.(it means 8nodes means 8 orted process it
will show)
When I am not able to run,hangup the i verified that the orted processes in
the node are more than one in few of the nodes(total it will be >8 for 8
nodes)

compute-0-6: tel      12279     1  0 18:54 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 2
-mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880
;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
compute-0-6: tel      12280 12279  0 18:54 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 2
-mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880
;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
compute-0-6: tel      12281 12279  0 18:54 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 2
-mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880
;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
compute-0-6: tel      12282 12279  0 18:54 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 2
-mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880
;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
compute-0-6: tel      12283 12279  0 18:54 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 2
-mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880
;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
compute-0-4: tel      12073     1  0 18:54 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 1
-mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880
;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
compute-0-4: tel      12074 12073  0 18:54 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 1
-mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880
;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
compute-0-4: tel      12075 12073  0 18:54 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 1
-mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880
;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
compute-0-4: tel      12076 12073  0 18:54 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 1
-mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880
;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
compute-0-4: tel      12077 12073  0 18:54 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 744292352 -mca orte_ess_vpid 1
-mca orte_ess_num_procs 3 --hnp-uri 744292352.0;tcp://10.0.10.1:39880
;tcp://162.0.50.111:39880;tcp://192.168.1.125:39880 --mca btl openib,self,sm
-bash-4.1# pdsh -w compute-0-[0-19] ps -ef |grep orte
compute-0-8: tel       6839     1  0 18:57 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 322371584 -mca orte_ess_vpid 2
-mca orte_ess_num_procs 3 --hnp-uri 322371584.0;tcp://10.0.10.1:43142
;tcp://162.0.50.111:43142;tcp://192.168.1.125:43142 --mca btl openib,self,sm
compute-0-7: tel       7451     1  0 18:57 ?        00:00:00 orted
--daemonize -mca ess env -mca orte_ess_jobid 322371584 -mca orte_ess_vpid 1
-mca orte_ess_num_procs 3 --hnp-uri 322371584.0;tcp://10.0.10.1:43142
;tcp://162.0.50.111:43142;tcp://192.168.1.125:43142 --mca btl openib,self,sm
-bash-4.1#

nodes which are showing more orted process, I am restarted.  But it is not
sure after restart it may take or it may not take.


Please advoice/help.

Thanks.

Venkat

Reply via email to