Hi Ralph, Reuti,

 
I've just observed the same issue without specifying -np.

Please find attached the ps -elfax output from the computing nodes and some sge 
related information.

 
Regards,

Eloi

 
 
 
-----Original message-----
From:Ralph Castain <r...@open-mpi.org>
Sent:Wed 04-11-2012 02:25 pm
Subject:Re: [OMPI users] sge tight integration leads to bad allocation
To:Open MPI Users <us...@open-mpi.org>; 

On Apr 11, 2012, at 6:20 AM, Reuti wrote:

> Am 11.04.2012 um 04:26 schrieb Ralph Castain:
> 
>> Hi Reuti
>> 
>> Can you replicate this problem on your machine? Can you try it with 1.5?
> 
> No. It's also working fine in 1.5.5 in some tests. I even forced an uneven 
> distribution by limiting the slots setting for some machines in the queue 
> configuration.

Thanks - that confirms what I've been able to test. It sounds like it is 
something in Eloi's setup, but I can't fathom what it would be - the 
allocations all look acceptable.

I'm stumped. :-(
[charlie:23181] Warning: could not find environment variable "ACTRAN_DEBUG"
[charlie:23181] ras:gridengine: JOB_ID: 1882
[charlie:23181] ras:gridengine: PE_HOSTFILE: /opt/sge/default/spool/charlie/active_jobs/1882.1/pe_hostfile
[charlie:23181] ras:gridengine: charlie.fft: PE_HOSTFILE shows slots=6
[charlie:23181] ras:gridengine: carl.fft: PE_HOSTFILE shows slots=3

======================   ALLOCATED NODES   ======================

 Data for node: Name: charlie	 	Launch id: -1	Arch: ffc91200	State: 2
 	Num boards: 1	Num sockets/board: 2	Num cores/socket: 4
 	Daemon: [[39528,0],0]	Daemon launched: True
 	Num slots: 6	Slots in use: 0
 	Num slots allocated: 6	Max slots: 0
 	Username on node: NULL
 	Num procs: 0	Next node_rank: 0
 Data for node: Name: carl.fft	 	Launch id: -1	Arch: 0	State: 2
 	Num boards: 1	Num sockets/board: 2	Num cores/socket: 4
 	Daemon: Not defined	Daemon launched: False
 	Num slots: 3	Slots in use: 0
 	Num slots allocated: 3	Max slots: 0
 	Username on node: NULL
 	Num procs: 0	Next node_rank: 0

=================================================================

 Map generated by mapping policy: 0200
 	Npernode: 0	Oversubscribe allowed: TRUE	CPU Lists: FALSE
 	Num new daemons: 1	New daemon starting vpid 1
 	Num nodes: 2

 Data for node: Name: charlie	 	Launch id: -1	Arch: ffc91200	State: 2
 	Num boards: 1	Num sockets/board: 2	Num cores/socket: 4
 	Daemon: [[39528,0],0]	Daemon launched: True
 	Num slots: 6	Slots in use: 5
 	Num slots allocated: 6	Max slots: 0
 	Username on node: NULL
 	Num procs: 5	Next node_rank: 5
 	Data for proc: [[39528,1],0]
 		Pid: 0	Local rank: 0	Node rank: 0
 		State: 0	App_context: 0	Slot list: NULL
 	Data for proc: [[39528,1],2]
 		Pid: 0	Local rank: 1	Node rank: 1
 		State: 0	App_context: 0	Slot list: NULL
 	Data for proc: [[39528,1],4]
 		Pid: 0	Local rank: 2	Node rank: 2
 		State: 0	App_context: 0	Slot list: NULL
 	Data for proc: [[39528,1],6]
 		Pid: 0	Local rank: 3	Node rank: 3
 		State: 0	App_context: 0	Slot list: NULL
 	Data for proc: [[39528,1],8]
 		Pid: 0	Local rank: 4	Node rank: 4
 		State: 0	App_context: 0	Slot list: NULL

 Data for node: Name: carl.fft	 	Launch id: -1	Arch: 0	State: 2
 	Num boards: 1	Num sockets/board: 2	Num cores/socket: 4
 	Daemon: [[39528,0],1]	Daemon launched: False
 	Num slots: 3	Slots in use: 4
 	Num slots allocated: 3	Max slots: 0
 	Username on node: NULL
 	Num procs: 4	Next node_rank: 4
 	Data for proc: [[39528,1],1]
 		Pid: 0	Local rank: 0	Node rank: 0
 		State: 0	App_context: 0	Slot list: NULL
 	Data for proc: [[39528,1],3]
 		Pid: 0	Local rank: 1	Node rank: 1
 		State: 0	App_context: 0	Slot list: NULL
 	Data for proc: [[39528,1],5]
 		Pid: 0	Local rank: 2	Node rank: 2
 		State: 0	App_context: 0	Slot list: NULL
 	Data for proc: [[39528,1],7]
 		Pid: 0	Local rank: 3	Node rank: 3
 		State: 0	App_context: 0	Slot list: NULL
[charlie:23187] mca: base: component_find: unable to open /opt/openmpi-1.4.4/lib/openmpi/mca_btl_gm: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[charlie:23188] mca: base: component_find: unable to open /opt/openmpi-1.4.4/lib/openmpi/mca_btl_gm: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[charlie:23185] mca: base: component_find: unable to open /opt/openmpi-1.4.4/lib/openmpi/mca_btl_gm: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[charlie:23186] mca: base: component_find: unable to open /opt/openmpi-1.4.4/lib/openmpi/mca_btl_gm: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[charlie:23189] mca: base: component_find: unable to open /opt/openmpi-1.4.4/lib/openmpi/mca_btl_gm: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
warning:regcache incompatible with malloc
warning:regcache incompatible with malloc
warning:regcache incompatible with malloc
warning:regcache incompatible with malloc
[charlie:23189] mca_btl_mx_init: mx_open_endpoint() failed with status 20 (Busy)
warning:regcache incompatible with malloc
[charlie:23187] mca_btl_mx_init: mx_open_endpoint() failed with status 20 (Busy)
[carl:00563] mca: base: component_find: unable to open /opt/openmpi-1.4.4/lib/openmpi/mca_btl_gm: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[carl:00562] mca: base: component_find: unable to open /opt/openmpi-1.4.4/lib/openmpi/mca_btl_gm: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[carl:00561] mca: base: component_find: unable to open /opt/openmpi-1.4.4/lib/openmpi/mca_btl_gm: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
[carl:00564] mca: base: component_find: unable to open /opt/openmpi-1.4.4/lib/openmpi/mca_btl_gm: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored)
warning:regcache incompatible with malloc
warning:regcache incompatible with malloc
warning:regcache incompatible with malloc
warning:regcache incompatible with malloc

Attachment: job1882.sh
Description: application/shellscript

Attachment: pselfax.carl
Description: Binary data

Attachment: pselfax.charlie
Description: Binary data

Attachment: qstat-gt
Description: Binary data

Attachment: qstat-j1882
Description: Binary data

Reply via email to