in the "old" 1.4.x and 1.5.x, I achieved this by using rankfiles (see FAQ), and it worked very well. With these versions, --byslot etc. didn't work for me, I always needed the rankfiles. I haven't tried the overhauled "convenience wrappers" in 1.6 that you are using for this feature yet, but I see no reason why the "old" way should not work, although it requires some shell magic if rankfiles are to be generated automatically from e.g. PBS or SLURM node lists.

Dominik

On 07/17/2012 12:13 AM, Anne M. Hammond wrote:
There are 2 physical processors, each with 4 cores (no hyperthreading).

I want to instruct openmpi to run only on the first processor, using 4 cores.


[hammond@node48 ~]$ cat /proc/cpuinfo
processor: 0
vendor_id: AuthenticAMD
cpu family: 16
model: 4
model name: Quad-Core AMD Opteron(tm) Processor 2376
stepping: 2
cpu MHz: 2311.694
cache size: 512 KB
physical id: 0
siblings: 4
core id: 0
cpu cores: 4
apicid: 0
initial apicid: 0
fpu: yes
fpu_exception: yes
cpuid level: 5
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
bogomips: 4623.38
TLB size: 1024 4K pages
clflush size: 64
cache_alignment: 64
address sizes: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor: 1
vendor_id: AuthenticAMD
cpu family: 16
model: 4
model name: Quad-Core AMD Opteron(tm) Processor 2376
stepping: 2
cpu MHz: 2311.694
cache size: 512 KB
physical id: 0
siblings: 4
core id: 1
cpu cores: 4
apicid: 1
initial apicid: 1
fpu: yes
fpu_exception: yes
cpuid level: 5
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
bogomips: 4623.17
TLB size: 1024 4K pages
clflush size: 64
cache_alignment: 64
address sizes: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor: 2
vendor_id: AuthenticAMD
cpu family: 16
model: 4
model name: Quad-Core AMD Opteron(tm) Processor 2376
stepping: 2
cpu MHz: 2311.694
cache size: 512 KB
physical id: 0
siblings: 4
core id: 2
cpu cores: 4
apicid: 2
initial apicid: 2
fpu: yes
fpu_exception: yes
cpuid level: 5
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
bogomips: 4623.19
TLB size: 1024 4K pages
clflush size: 64
cache_alignment: 64
address sizes: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor: 3
vendor_id: AuthenticAMD
cpu family: 16
model: 4
model name: Quad-Core AMD Opteron(tm) Processor 2376
stepping: 2
cpu MHz: 2311.694
cache size: 512 KB
physical id: 0
siblings: 4
core id: 3
cpu cores: 4
apicid: 3
initial apicid: 3
fpu: yes
fpu_exception: yes
cpuid level: 5
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
bogomips: 4623.16
TLB size: 1024 4K pages
clflush size: 64
cache_alignment: 64
address sizes: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor: 4
vendor_id: AuthenticAMD
cpu family: 16
model: 4
model name: Quad-Core AMD Opteron(tm) Processor 2376
stepping: 2
cpu MHz: 2311.694
cache size: 512 KB
physical id: 1
siblings: 4
core id: 0
cpu cores: 4
apicid: 4
initial apicid: 4
fpu: yes
fpu_exception: yes
cpuid level: 5
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
bogomips: 4623.16
TLB size: 1024 4K pages
clflush size: 64
cache_alignment: 64
address sizes: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor: 5
vendor_id: AuthenticAMD
cpu family: 16
model: 4
model name: Quad-Core AMD Opteron(tm) Processor 2376
stepping: 2
cpu MHz: 2311.694
cache size: 512 KB
physical id: 1
siblings: 4
core id: 1
cpu cores: 4
apicid: 5
initial apicid: 5
fpu: yes
fpu_exception: yes
cpuid level: 5
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
bogomips: 4623.16
TLB size: 1024 4K pages
clflush size: 64
cache_alignment: 64
address sizes: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor: 6
vendor_id: AuthenticAMD
cpu family: 16
model: 4
model name: Quad-Core AMD Opteron(tm) Processor 2376
stepping: 2
cpu MHz: 2311.694
cache size: 512 KB
physical id: 1
siblings: 4
core id: 2
cpu cores: 4
apicid: 6
initial apicid: 6
fpu: yes
fpu_exception: yes
cpuid level: 5
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
bogomips: 4623.17
TLB size: 1024 4K pages
clflush size: 64
cache_alignment: 64
address sizes: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor: 7
vendor_id: AuthenticAMD
cpu family: 16
model: 4
model name: Quad-Core AMD Opteron(tm) Processor 2376
stepping: 2
cpu MHz: 2311.694
cache size: 512 KB
physical id: 1
siblings: 4
core id: 3
cpu cores: 4
apicid: 7
initial apicid: 7
fpu: yes
fpu_exception: yes
cpuid level: 5
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
bogomips: 4623.18
TLB size: 1024 4K pages
clflush size: 64
cache_alignment: 64
address sizes: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate


On Jul 16, 2012, at 4:09 PM, Elken, Tom wrote:

Anne,

output from "cat /proc/cpuinfo" on your node "hostname" may help those trying to answer.

-Tom

-----Original Message-----
From: users-boun...@open-mpi.org <mailto:users-boun...@open-mpi.org> [mailto:users-boun...@open-mpi.org] On
Behalf Of Ralph Castain
Sent: Monday, July 16, 2012 2:47 PM
To: Open MPI Users
Subject: Re: [OMPI users] openmpi tar.gz for 1.6.1 or 1.6.2

I gather there are two sockets on this node? So the second cmd line is equivalent
to leaving "num-sockets" off of the cmd line?

I haven't tried what you are doing, so it is quite possible this is a bug.


On Jul 16, 2012, at 1:49 PM, Anne M. Hammond wrote:

Thanks!

Built the latest snapshot.  Still getting an error when trying to run
on only one socket (see below):  Is there a workaround?

[hammond@node65 bin]$ ./mpirun -np 4 --num-sockets 1 --npersocket 4
hostname
----------------------------------------------------------------------
---- An invalid physical processor ID was returned when attempting to
bind an MPI process to a unique processor.

This usually means that you requested binding to more processors than
exist (e.g., trying to bind N MPI processes to M processors, where N >
M).  Double check that you have enough unique processors for all the
MPI processes that you are launching on this host.

You job will now abort.
----------------------------------------------------------------------
----
----------------------------------------------------------------------
---- mpirun was unable to start the specified application as it
encountered an error:

Error name: Fatal
Node: node65.cl.corp.com <http://node65.cl.corp.com>

when attempting to start process rank 0.
----------------------------------------------------------------------
----
4 total processes failed to start


[hammond@node65 bin]$ ./mpirun -np 4 --num-sockets 2 --npersocket 4
hostname node65.cl.corp.com <http://node65.cl.corp.com> node65.cl.corp.com <http://node65.cl.corp.com> node65.cl.corp.com <http://node65.cl.corp.com>
node65.cl.corp.com <http://node65.cl.corp.com>
[hammond@node65 bin]$




On Jul 16, 2012, at 12:56 PM, Ralph Castain wrote:

Jeff is at the MPI Forum this week, so his answers will be delayed. Last I
heard, it was close, but no specific date has been set.


On Jul 16, 2012, at 11:49 AM, Michael E. Thomadakis wrote:

When is the expected date for the official 1.6.1 (or 1.6.2 ?) to be available ?

mike

On 07/16/2012 01:44 PM, Ralph Castain wrote:
You can get it here:

http://www.open-mpi.org/nightly/v1.6/

On Jul 16, 2012, at 10:22 AM, Anne M. Hammond wrote:

Hi,

For benchmarking, we would like to use openmpi with
--num-sockets 1

This fails in 1.6, but Bug Report #3119 indicates it is changed in
1.6.1.

Is 1.6.1 or 1.6.2 available in tar.gz form?

Thanks!
Anne



_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users


Anne M. Hammond - Systems / Network Administration - Tech-X Corp
                hammond_at_txcorp.com 720-974-1840





_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users


Anne M. Hammond - Systems / Network Administration - Tech-X Corp
                  hammond_at_txcorp.com 720-974-1840






_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jun.-Prof. Dr. Dominik Göddeke
Hardware-orientierte Numerik für große Systeme
Institut für Angewandte Mathematik (LS III)
Fakultät für Mathematik, Technische Universität Dortmund
http://www.mathematik.tu-dortmund.de/~goeddeke
Tel. +49-(0)231-755-7218  Fax +49-(0)231-755-5933
--
Sent from my old-fashioned computer and not from a mobile device.
I proudly boycott 24/7 availability.

Reply via email to