date:20080207

[OMPI users] Oversubscribing a subset of a machine's cores

2008-02-07 Thread Torje Henriksen


Hi,

I have a slightly odd problem, that you might not think is important  
at all. Anyways, here it goes:


I'm using a single eight-core machine. I want to oversubscribe four of  
the cores and leave the other four idle. My approach is to make a  
hostfile:


localhost slot=4 # shouldn't this limit the core count to 4?


and run the command:

$mpirun -np 8 --hostfile my_hostfile ./my_mpiprog

or the command:

$mpirun -np 8 --host localhost,localhost,localhost,localhost ./ 
my_mpiprog



Still, all eight cores are being used. I can see why you would want to  
use all cores, and I can see that oversubscribing a sub-set of the  
cores might seem silly.  My question is, is it possible to do what I  
want to do without hacking the open mpi code?


Guess I just wanted to know is there is a solution I overlooked before  
I start hacking like a madman :)



Thanks

Torje Henriksen

Re: [OMPI users] MPI_CART_CREATE and Fortran 90 Interface

2008-02-07 Thread Jeff Squyres


Michal -

You are absolutely right; sorry about that.  I have fixed the bug in  
the OMPI development trunk which means that it will be incorporated in  
the upcoming v1.3 series (see https://svn.open-mpi.org/trac/ompi/changeset/17395) 
.  I also filed a change request for the v1.2 branch; if we ever do a  
v1.2.6 release, this fix will be included in it.


Thanks for reporting this problem!


On Feb 6, 2008, at 8:49 AM, Michal Charemza wrote:


Hi,

I'm having a bit of trouble getting MPI_CART_CREATE to work with the
Fortran 90 Interface, i.e. if I "use mpi", I get an error at
compilation time (of my program) of:

There is no specific subroutine for the generic 'mpi_cart_create'

However, if I include mpif.h this error does not occur. After looking
into the source, I see that in my mpi-f90-interfaces.h, there is a  
part:


interface MPI_Cart_create

subroutine MPI_Cart_create(old_comm, ndims, dims, periods, reorder, &
comm_cart, ierr)
  integer, intent(in) :: old_comm
  integer, intent(in) :: ndims
  integer, dimension(*), intent(in) :: dims
  integer, dimension(*), intent(in) :: periods
  integer, intent(in) :: reorder
  integer, intent(out) :: comm_cart
  integer, intent(out) :: ierr
end subroutine MPI_Cart_create

end interface MPI_Cart_create

I thought according to the MPI specs, periods should be a logical
array, and reorder should be a logical scalar. Is this a bug in the
Fortran 90 interface?

Michal.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] mpirun, paths and xterm again (xserver problem solved; library problem still there)

2008-02-07 Thread Jeff Squyres

The whole question of how to invoke xterms for gdb via mpirun keeps  
coming up, so when this thread is done, I'll add a pile of this  
information to the FAQ.


More below.

On Feb 6, 2008, at 10:52 AM, jody wrote:


I now solved the "ssh" part of my Problem
The XServer is being started with the nolisten option (thanks Allen).
In Fedora (Gnome) this can easily be changed by choosing the
the "Login Screen" tool from the System|Administration Menu.
There, under the tab "Security", remove the checkmark from
"Deny TCP connections from xserver"
Of course, this needs root access - fortunately,
i am the boss of my computer ;)
Additionally, at least the port 6000 should be open.

This leaves me with my second problem

$mpirun -np 5 -hostfile testhosts -x DISPLAY=plankton:0.0 xterm -hold
-e ./MPITest
Opens 2 xterms from nano (remote) and 3 xterms from plankton(local).
The local screens display the message:
./MPITest: error while loading shared libraries: libmpi_cxx.so.0:
cannot open shared object file: No such file or directory

Which is unbelievably strange, since for all xterms (local & remote)
the output of
  $mpirun -np 5 -hostfile testhosts -x DISPLAY=plankton:0.0 xterm
-hold -e printenv
contains the PATH variable containing the path to openmpi/bin and the
LD_LIBRARY_PATH
containing the path to openmpi/lib


The results of these two commands do seem to contradict each other;  
hmm.  Just to be absolutely sure, did you cut-n-paste the  
LD_LIBRARY_PATH directory output from printenv and try to "ls" it to  
ensure that it's completely spelled right, etc.?  I suspect that it's  
right since your other commands work, but at this point, it's worth  
checking the "obvious" things as well...


What shell are you using?  You might want to add some echo statements  
to your shell startup scripts to ensure that all the right parts are  
being run in each of the cases -- perhaps, for some weird reason, they  
aren't in the problematic cases...?  [shrug]




Doing
  $mpirun -np 5 -hostfile testhosts -x DISPLAY=plankton:0.0 xterm
-hold -e locate libmpi_cxx
returns on all xterms (local & remote)
/opt/openmpi/lib/libmpi_cxx.la
/opt/openmpi/lib/libmpi_cxx.so
/opt/openmpi/lib/libmpi_cxx.so.0
/opt/openmpi/lib/libmpi_cxx.so.0.0.0

On the other hand, the application has no problem when being called
without xterms:
$mpirun -np 5 -hostfile testhosts ./MPITest

Does anybody have an idea why that should happen?


Thanks
  Jody
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] Infinipath context limit

2008-02-07 Thread Daniël Mantione



On Wed, 6 Feb 2008, Christian Bell wrote:

> Hi Daniel --
> 
>   PSM should determine your node setup and enable shared contexts
>   accordingly, but it looks like something isn't working right.  You
>   can apply the patch I've attached to this e-mail and things should
>   work again.

Alas, it doesn't compile (patch was applied to OpenMPI 1.2.5):

mtl_psm.c(109): error: struct "orte_proc_info_t" has no field "num_local_procs"
  if (orte_process_info.num_local_procs > 0) {
^

mtl_psm.c(111): error: struct "orte_proc_info_t" has no field "num_local_procs"
 snprintf(buf, sizeof buf - 1, "%d", orte_process_info.num_local_procs);
   ^

mtl_psm.c(113): error: struct "orte_proc_info_t" has no field "local_rank"
 snprintf(buf, sizeof buf - 1, "%d", orte_process_info.local_rank);
   ^

compilation aborted for mtl_psm.c (code 2)

  
>   However, it would be useful to identify what's going wrong.  Can
>   you compile a hello world program and run it with the machinefile
>   you're trying to use.  Send me the output from:
> 
>   mpirun -machinefile  env PSM_TRACEMASK=0x101 ./hello_world
> 
>   I understand your failure mode only if somehow the 8-core node is
>   detected to be a 4-core node.  The output should tell us this.

Attached. It seems it does try to enable context sharing but for some 
reason /dev/ipath still returns a busy code.

Daniëlnode017.23692env  IPATH_DISABLE_MMAP_MALLOC Disable mmap for malloc()   
 => NO
node017.23692env  IPATH_NO_CPUAFFINITY  Prevent PSM from setting affinity   
 => NO
node017.23692env  IPATH_UNITDevice Unit number (-1 autodetects) 
 => -1
node017.23692env  PSM_DEVICES   Ordered list of PSM-level devices   
 => shm,ipath (default was self,shm,ipath)
node017.23692psmi_parse_devices: PSM Device allocation order: amsh,ips
node017.23692env  PSM_MEMORYMemory usage mode (normal or large) 
 => normal
node017.23692env  PSM_SHAREDCONTEXTSEnable shared contexts  
 => YES (default was YES)
node017.23692ipath_setaffinity: PORT_INFO returned 
unit_id=0/1,port=1/4,hwports=4,subport=0/0,nproc=8
node017.23692ipath_setaffinity: Set CPU affinity to 0, port 0:1:0 (1 active 
chips)
node017.23692ipath_userinit: Driver is not QLogic-built
node017.23692ipath_userinit: Runtime flags are 0x46, explicit mallopt mmap 
disable in malloc is off
node017.23692psmi_port_open: Opened port 1.0 on device /dev/ipath 
(LID=14,epid=e0001,flags=46)
node017.23692env  PSM_RCVTHREAD Recv thread flags (0 disables 
thread)=> 0x1
node017:1.0.env  PSM_MQ_SENDREQS_MAX   Max num of isend requests in flight  
=> 1048576
node017:1.0.env  PSM_MQ_RECVREQS_MAX   Max num of irecv requests in flight  
=> 1048576
node017:1.0.env  PSM_MQ_RNDV_IPATH_THRESH  ipath eager-to-rendezvous switchover 
=> 64000
node017:1.0.env  PSM_MQ_RNDV_SHM_THRESHshm eager-to-rendezvous switchover   
=> 16000
node017:1.0.ips_spio_init: PIO copy uses forced ordering
node017:1.0.env  PSM_TID   Tid proto flags (0 disables 
protocol)=>  0x1
node017:1.0.ips_protoexp_init: Tid control message settings: timeout 
min=200us/max=1000us, interrupt when trying attempt #2
node017:1.0.ips_proto_init: Tid error control: warning every 30 secs, fatal 
error after 250 tid errors
node017:1.0.ips_proto_init: Ethernet Host IP=10.141.0.17 and PID=23692
node017:1.0.psmi_shm_attach: Registered as master to key 
/psm_shm.d999e196-868e-c6e6-0d4a-bc2c78de85f1
node017:1.0.psmi_shm_attach: Mapped shm control object at 0x2b25a000
node017:1.0.psmi_shm_attach: Mapped and initalized shm object control page at 
0x2b25a000,size=4096
node017:1.0.psmi_shm_attach: Grabbed shmidx 0
node017:1.0.amsh_init_segment: Grew shared segment for 1 procs, size=5.93 MB
node017:1.0.am_remap_segment: shm segment remap from 0x2b25a000..4096 to 
0x2aaab26b3000..6217728 (relocated=YES)
node017:1.0.ips_ptl_pollintr: Enabled communication thread on URG packets
node017.23691env  IPATH_DISABLE_MMAP_MALLOC Disable mmap for malloc()   
 => NO
node017.23691env  IPATH_NO_CPUAFFINITY  Prevent PSM from setting affinity   
 => NO
node017.23691env  IPATH_UNITDevice Unit number (-1 autodetects) 
 => -1
node017.23691env  PSM_DEVICES   Ordered list of PSM-level devices   
 => shm,ipath (default was self,shm,ipath)
node017.23691psmi_parse_devices: PSM Device allocation order: amsh,ips
node017.23691env  PSM_MEMORYMemory usage mode (normal or large) 
 => normal
node017.23691env  PSM_SHAREDCONTEXTSEnable shared contexts  
 => YES (default was YES)
node017.23691ipath_setaffinity: PORT_INFO returned 
unit_id=0/1,port=2/4,hwports=4,subport=0/0,nproc=8
node017.2

Re: [OMPI users] Oversubscribing a subset of a machine's cores

2008-02-07 Thread Joe Landman


Torje Henriksen wrote:

[...]

Still, all eight cores are being used. I can see why you would want to  
use all cores, and I can see that oversubscribing a sub-set of the  
cores might seem silly.  My question is, is it possible to do what I  
want to do without hacking the open mpi code?


Could you get numactl to help you do what you want?  That is, for the 
code, somehow tweak the launcher to run


numactl --physcpubind=X ...

or similar?



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: land...@scalableinformatics.com
web  : http://www.scalableinformatics.com
   http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615

Re: [OMPI users] mpirun, paths and xterm again (xserver problem solved; library problem still there)

2008-02-07 Thread jody

Hi Jeff

> The results of these two commands do seem to contradict each other;
> hmm.  Just to be absolutely sure, did you cut-n-paste the
> LD_LIBRARY_PATH directory output from printenv and try to "ls" it to
> ensure that it's completely spelled right, etc.?  I suspect that it's
> right since your other commands work, but at this point, it's worth
> checking the "obvious" things as well...

I wrote a little command called envliblist which consists of this line:
printenv | grep PATH | gawk -F "_PATH=" '{ print $2 }' | gawk -F ":"
'{ print $1 }' | xargs ls -al

When i do
mpirun -np 5 -hostfile testhosts -x DISPLAY xterm -hold -e ./envliblist
all  xterms (local & remote) display the contents of the openmpi/lib directory.

Another strange result:
I have a shell script for launching the debugger in an xterm:
[jody]:/mnt/data1/neander:$cat run_gdb.sh
#!/bin/sh
#
# save the program name
export PROG="$1"
# shift away program name (leaves program params)
shift
# create a command file for gdb, to start it automatically
echo run $*  > gdb.cmd
# do the term
xterm -e gdb -x gdb.cmd $PROG

exit 0

When i run
  mpirun -np 5 --hostfile testhosts -x DISPLAY ./run_gdb.sh ./MPITest
it works!

Just to compare
 mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -hold -e ./MPITest
does not work.

I notice the only difference between the to above commands is that
in the run_gdb script xterm has no "-hold" parameter!
Indeed,
 mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -e ./MPITest
does work. To actually see that it works (MPITest is simple Hello MPI
app) i had to do
 mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -hold -e
"./MPITest >> output.txt"
and check output.txt.

Does anybody have an explanation for this weird happening?

Jody

Re: [OMPI users] mpirun, paths and xterm again (xserver problem solved; library problem still there)

2008-02-07 Thread Jeff Squyres


On Feb 7, 2008, at 10:07 AM, jody wrote:

I wrote a little command called envliblist which consists of this  
line:

printenv | grep PATH | gawk -F "_PATH=" '{ print $2 }' | gawk -F ":"
'{ print $1 }' | xargs ls -al

When i do
mpirun -np 5 -hostfile testhosts -x DISPLAY xterm -hold -e ./ 
envliblist
all  xterms (local & remote) display the contents of the openmpi/lib  
directory.


Ok, good.


Another strange result:
I have a shell script for launching the debugger in an xterm:
[jody]:/mnt/data1/neander:$cat run_gdb.sh
#!/bin/sh
#
# save the program name
export PROG="$1"
# shift away program name (leaves program params)
shift
# create a command file for gdb, to start it automatically
echo run $*  > gdb.cmd
# do the term
xterm -e gdb -x gdb.cmd $PROG

exit 0

When i run
 mpirun -np 5 --hostfile testhosts -x DISPLAY ./run_gdb.sh ./MPITest
it works!

Just to compare
mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -hold -e ./MPITest
does not work.


It seems that if you launch shell scripts, things work.  But if you  
run xterm without a shell script, it does not work.  I do not think it  
is a difference of -hold vs. no -hold.  Indeed, I can run both of  
these commands just fine on my system:


% mpirun -np 1 --hostfile h -x DISPLAY=.cisco.com:0 xterm - 
hold -e gdb ~/mpi/hello


% mpirun -np 1 --hostfile h -x DISPLAY=.cisco.com:0 xterm  -e  
gdb ~/mpi/hello


Note that my setup is a little different than yours; I'm using a Mac  
laptop and ssh'ing to a server where I'm invoking mpirun.  The  
hostfile "h" contains a 2nd server where xterm/gdb/hello are running.





I notice the only difference between the to above commands is that
in the run_gdb script xterm has no "-hold" parameter!
Indeed,
mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -e ./MPITest
does work. To actually see that it works (MPITest is simple Hello MPI
app) i had to do
mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -hold -e
"./MPITest >> output.txt"
and check output.txt.

Does anybody have an explanation for this weird happening?

Jody
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] bug in MPI_ACCUMULATE for window offsets > 2**31 - 1 bytes? openmpi v1.2.5

2008-02-07 Thread Tim Prins


Hi Stefan,

I was able to verify the problem. Turns out this is a problem with other 
onesided operations as well. Attached is a simple test case I made in c 
using MPI_Put that also fails.


The problem is that the target count and displacements are both sent as 
signed 32 bit integers. Then, the receiver multiplies them together and 
adds them to the window base. However, this multiplication is done using 
the signed 32 bit integers, which overflows. This is then added to the 
64 bit pointer. This, of course, results in a bad address.


I have attached a patch against a recent development version that fixes 
this for me. I am also copying Brian Barrett, who did all the work on 
the onesided code.


Brian: if possible, please take a look at the attached patch and test case.

Thanks for the report!

Tim Prins

Stefan Knecht wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

I encounter a problem with the routine MPI_ACCUMULATE trying to sum up
MPI_REAL8's on a large memory window with a large offset.
My program running (on a single processor, x86_64 architecture) crashes 
with

an error message like:

[node14:16236] *** Process received signal ***
[node14:16236] Signal: Segmentation fault (11)
[node14:16236] Signal code: Address not mapped (1)
[node14:16236] Failing at address: 0x2aaa32b16000
[node14:16236] [ 0] /lib64/libpthread.so.0 [0x32e080de00]
[node14:16236] [ 1] 
/home/stefan/bin/openmpi-1.2.5/lib/libmpi.so.0(ompi_mpi_op_sum_double+0x10) 
[0x2af15530]
[node14:16236] [ 2] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_process_op+0x2d7) 


[0x2aaab1a47257]
[node14:16236] [ 3] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so 
[0x2aaab1a45432]
[node14:16236] [ 4] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_passive_unlock+0x93) 


[0x2aaab1a48243]
[node14:16236] [ 5] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so 
[0x2aaab1a43436]
[node14:16236] [ 6] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0xff) 


[0x2aaab1a42e0f]
[node14:16236] [ 7] 
/home/stefan/bin/openmpi-1.2.5/lib/libopen-pal.so.0(opal_progress+0x4a) 
[0x2b3dfa0a]
[node14:16236] [ 8] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_module_unlock+0x2a9) 


[0x2aaab1a48629]
[node14:16236] [ 9] 
/home/stefan/bin/openmpi-1.2.5/lib/libmpi.so.0(PMPI_Win_unlock+0xe1) 
[0x2af4a291]
[node14:16236] [10] 
/home/stefan/bin/openmpi-1.2.5/lib/libmpi_f77.so.0(mpi_win_unlock_+0x25) 
[0x2acdd8c5]
[node14:16236] [11] /home/stefan/calc/mpi2_test/a.out(MAIN__+0x809) 
[0x401851]

[node14:16236] [12] /home/stefan/calc/mpi2_test/a.out(main+0xe) [0x401bbe]
[node14:16236] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x32dfc1dab4]
[node14:16236] [14] /home/stefan/calc/mpi2_test/a.out [0x400f99]
[node14:16236] *** End of error message ***
mpirun noticed that job rank 0 with PID 16236 on node node14 exited on 
signal 11 (Segmentation fault).



The relevant part of my FORTRAN source code reads as:

~  program accumulate_test
~  IMPLICIT REAL*8 (A-H,O-Z)
~  include 'mpif.h'
~  INTEGER(KIND=MPI_OFFSET_KIND) MX_SIZE_M
C dummy size parameter
~  PARAMETER (MX_SIZE_M = 1 000 000)
~  INTEGER MPIerr, MYID, NPROC
~  INTEGER ITARGET, MY_X_WIN, JCOUNT, JCOUNT_T
~  INTEGER(KIND=MPI_ADDRESS_KIND) MEM_X, MEM_Y
~  INTEGER(KIND=MPI_ADDRESS_KIND) IDISPL_WIN
~  INTEGER(KIND=MPI_ADDRESS_KIND) PTR1, PTR2
~  INTEGER(KIND=MPI_INTEGER_KIND) ISIZE_REAL8
~  INTEGER*8 NELEMENT_X, NELEMENT_Y
~  POINTER (PTR1, XMAT(MX_SIZE_M))
~  POINTER (PTR2, YMAT(MX_SIZE_M))
C
~  CALL MPI_INIT( MPIerr )
~  CALL MPI_COMM_RANK( MPI_COMM_WORLD, MYID,  MPIerr)
~  CALL MPI_COMM_SIZE( MPI_COMM_WORLD, NPROC, MPIerr)
C
~  NELEMENT_X = 400 000 000
~  NELEMENT_Y =  10 000
C
~  CALL MPI_TYPE_EXTENT(MPI_REAL8, ISIZE_REAL8, MPIerr)
~  MEM_X = NELEMENT_X * ISIZE_REAL8
~  MEM_Y = NELEMENT_Y * ISIZE_REAL8
C
C allocate memory
C
~  CALL MPI_ALLOC_MEM( MEM_X, MPI_INFO_NULL, PTR1, MPIerr)
~  CALL MPI_ALLOC_MEM( MEM_Y, MPI_INFO_NULL, PTR2, MPIerr)
C
C fill vectors with 0.0D0 and 1.0D0
C
~  CALL DZERO(XMAT,NELEMENT_X)
~  CALL DONE(YMAT,NELEMENT_Y)
C
C open memory window
C
~  CALL MPI_WIN_CREATE( XMAT, MEM_X, ISIZE_REAL8,
~ & MPI_INFO_NULL, MPI_COMM_WORLD,
~ & MY_X_WIN, MPIerr )
C lock window (MPI_LOCK_SHARED mode)
C select target ==> if itarget == myid: no 1-sided communication
C
~  ITARGET = MYID
~  CALL MPI_WIN_LOCK( MPI_LOCK_SHARED, ITARGET, MPI_MODE_NOCHECK,
~ &   MY_X_WIN, MPIerr)
C
C transfer data to target ITARGET
C
~  JCOUNT_T = 10 000
~  JCOUNT   = JCOUNT_T
C set displacement in memory window
~  IDISPL_WIN = 300 000 000
C
~  CALL MPI_ACCUMULATE( YMAT, JCOUNT, MPI_REAL8, ITARGET, IDISPL_WIN,
~ &

Re: [OMPI users] process placement with toruqe and OpenMP

2008-02-07 Thread Tim Prins


Hi Brock,

As far as I know there is no way to do this with Open MPI and torque. I 
believe people usually use hostfiles to do this sort of thing, but 
hostfiles do not work with torque.


You may want to look into the launcher commands to see if torque will do 
it for you. Slurm has an option '--cpus-per-task', but I just realized 
we don't handle that properly...


Tim

Brock Palen wrote:

Ok so I ask the mpirun masters how would you do the following:

I submit a job with torque (we use --with-tm) like the following:

nodes=4:ppn=2

My desired outcome is to have 1 mpi process per 2 cpus and use  
threaded blas (or my own OpenMP take your pick)


Our cluster has some 4 core machines thus the above job sometimes  
ends up looking like


nodes=1:ppn=4+nodes=2:ppn=2

The mpirun -bynode command will work in the case i get 4 nodes with  
only 2 cpus free.  But if any machine other than the first machine is  
my node with 4 cores free given to me by moab, I would end up  
starting a extra process on the first node, where mpirun thinks  
another cpu is free, but that cpu is really to be used by OpenMP, and  
that the last process should be placed on the node that has 4 cpus free.


I hope that wasn't to confusing, Its how to i launch hybrid jobs and  
make sure the process started by mpirun go where i want when my nodes  
have different core counts, and I am running via torque so using -H  
wont work, Also would prefer that all processes be started via TM.


Is this posable ?


Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Bad behavior in Allgatherv when a count is 0

2008-02-07 Thread Tim Mattox

Kenneth,
Have you tried the 1.2.5 version?  There were some fixes to the
vector collectives that could have resolved your problem.

On Feb 4, 2008 5:36 PM, George Bosilca  wrote:
> Kenneth,
>
> I cannot replicate this weird behavior with the current version in the
> trunk. I guess it has been fixed since 1.2.4.
>
>Thanks,
>  george.
>
>
> On Dec 13, 2007, at 6:58 PM, Moreland, Kenneth wrote:
>
> > I have found that on rare occasion Allgatherv fails to pass the data
> > to
> > all processes.  Given some magical combination of receive counts and
> > displacements, one or more processes are missing some or all of some
> > arrays in their receive buffer.  A necessary, but not sufficient,
> > condition seems to be that one of the receive counts is 0.  Beyond
> > that
> > I have not figured out any real pattern, but the example program
> > listed
> > below demonstrates the failure.  I have tried it on OpenMPI version
> > 1.2.3 and 1.2.4; it fails on both.  However, it works fine with
> > version
> > 1.1.2, so the problem must have been introduced since then.
> >
> > -Ken
> >
> >     Kenneth Moreland
> >***  Sandia National Laboratories
> > ***
> > *** *** ***  email: kmo...@sandia.gov
> > **  ***  **  phone: (505) 844-8919
> >***  fax:   (505) 845-0833
> >
> >
> >
> > #include 
> >
> > #include 
> > #include 
> >
> > int main(int argc, char **argv)
> > {
> >  int rank;
> >  int size;
> >  MPI_Comm smallComm;
> >  int senddata[5], recvdata[100];
> >  int lengths[3], offsets[3];
> >  int i, j;
> >
> >  MPI_Init(&argc, &argv);
> >
> >  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> >  MPI_Comm_size(MPI_COMM_WORLD, &size);
> >  if (size != 3)
> >{
> >printf("Need 3 processes.");
> >MPI_Abort(MPI_COMM_WORLD, 1);
> >}
> >
> >  for (i = 0; i < 100; i++) recvdata[i] = -1;
> >  for (i = 0; i < 5; i++) senddata[i] = rank*10 + i;
> >  lengths[0] = 5;  lengths[1] = 0;  lengths[2] = 5;
> >  offsets[0] = 3;  offsets[1] = 9;  offsets[2] = 10;
> >  MPI_Allgatherv(senddata, lengths[rank], MPI_INT,
> > recvdata, lengths, offsets, MPI_INT, MPI_COMM_WORLD);
> >
> >  for (i = 0; i < size; i++)
> >{
> >for (j = 0; j < lengths[i]; j++)
> >  {
> >  if (recvdata[offsets[i]+j] != 10*i+j)
> >{
> >printf("%d: Got bad data from rank %d, index %d: %d\n", rank,
> > i,
> > j,
> >   recvdata[offsets[i]+j]);
> >break;
> >}
> >  }
> >}
> >
> >  MPI_Finalize();
> >
> >  return 0;
> > }
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
I'm a bright... http://www.the-brights.net/

Re: [OMPI users] bug in MPI_ACCUMULATE for window offsets > 2**31 - 1 bytes? openmpi v1.2.5

2008-02-07 Thread Tim Prins


The fix I previously sent to the list has been committed in r17400.

Thanks,

Tim

Tim Prins wrote:

Hi Stefan,

I was able to verify the problem. Turns out this is a problem with other 
onesided operations as well. Attached is a simple test case I made in c 
using MPI_Put that also fails.


The problem is that the target count and displacements are both sent as 
signed 32 bit integers. Then, the receiver multiplies them together and 
adds them to the window base. However, this multiplication is done using 
the signed 32 bit integers, which overflows. This is then added to the 
64 bit pointer. This, of course, results in a bad address.


I have attached a patch against a recent development version that fixes 
this for me. I am also copying Brian Barrett, who did all the work on 
the onesided code.


Brian: if possible, please take a look at the attached patch and test case.

Thanks for the report!

Tim Prins

Stefan Knecht wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

I encounter a problem with the routine MPI_ACCUMULATE trying to sum up
MPI_REAL8's on a large memory window with a large offset.
My program running (on a single processor, x86_64 architecture) 
crashes with

an error message like:

[node14:16236] *** Process received signal ***
[node14:16236] Signal: Segmentation fault (11)
[node14:16236] Signal code: Address not mapped (1)
[node14:16236] Failing at address: 0x2aaa32b16000
[node14:16236] [ 0] /lib64/libpthread.so.0 [0x32e080de00]
[node14:16236] [ 1] 
/home/stefan/bin/openmpi-1.2.5/lib/libmpi.so.0(ompi_mpi_op_sum_double+0x10) 
[0x2af15530]
[node14:16236] [ 2] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_process_op+0x2d7) 


[0x2aaab1a47257]
[node14:16236] [ 3] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so 
[0x2aaab1a45432]
[node14:16236] [ 4] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_passive_unlock+0x93) 


[0x2aaab1a48243]
[node14:16236] [ 5] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so 
[0x2aaab1a43436]
[node14:16236] [ 6] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0xff) 


[0x2aaab1a42e0f]
[node14:16236] [ 7] 
/home/stefan/bin/openmpi-1.2.5/lib/libopen-pal.so.0(opal_progress+0x4a) 
[0x2b3dfa0a]
[node14:16236] [ 8] 
/home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_module_unlock+0x2a9) 


[0x2aaab1a48629]
[node14:16236] [ 9] 
/home/stefan/bin/openmpi-1.2.5/lib/libmpi.so.0(PMPI_Win_unlock+0xe1) 
[0x2af4a291]
[node14:16236] [10] 
/home/stefan/bin/openmpi-1.2.5/lib/libmpi_f77.so.0(mpi_win_unlock_+0x25) 
[0x2acdd8c5]
[node14:16236] [11] /home/stefan/calc/mpi2_test/a.out(MAIN__+0x809) 
[0x401851]
[node14:16236] [12] /home/stefan/calc/mpi2_test/a.out(main+0xe) 
[0x401bbe]
[node14:16236] [13] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x32dfc1dab4]

[node14:16236] [14] /home/stefan/calc/mpi2_test/a.out [0x400f99]
[node14:16236] *** End of error message ***
mpirun noticed that job rank 0 with PID 16236 on node node14 exited on 
signal 11 (Segmentation fault).



The relevant part of my FORTRAN source code reads as:

~  program accumulate_test
~  IMPLICIT REAL*8 (A-H,O-Z)
~  include 'mpif.h'
~  INTEGER(KIND=MPI_OFFSET_KIND) MX_SIZE_M
C dummy size parameter
~  PARAMETER (MX_SIZE_M = 1 000 000)
~  INTEGER MPIerr, MYID, NPROC
~  INTEGER ITARGET, MY_X_WIN, JCOUNT, JCOUNT_T
~  INTEGER(KIND=MPI_ADDRESS_KIND) MEM_X, MEM_Y
~  INTEGER(KIND=MPI_ADDRESS_KIND) IDISPL_WIN
~  INTEGER(KIND=MPI_ADDRESS_KIND) PTR1, PTR2
~  INTEGER(KIND=MPI_INTEGER_KIND) ISIZE_REAL8
~  INTEGER*8 NELEMENT_X, NELEMENT_Y
~  POINTER (PTR1, XMAT(MX_SIZE_M))
~  POINTER (PTR2, YMAT(MX_SIZE_M))
C
~  CALL MPI_INIT( MPIerr )
~  CALL MPI_COMM_RANK( MPI_COMM_WORLD, MYID,  MPIerr)
~  CALL MPI_COMM_SIZE( MPI_COMM_WORLD, NPROC, MPIerr)
C
~  NELEMENT_X = 400 000 000
~  NELEMENT_Y =  10 000
C
~  CALL MPI_TYPE_EXTENT(MPI_REAL8, ISIZE_REAL8, MPIerr)
~  MEM_X = NELEMENT_X * ISIZE_REAL8
~  MEM_Y = NELEMENT_Y * ISIZE_REAL8
C
C allocate memory
C
~  CALL MPI_ALLOC_MEM( MEM_X, MPI_INFO_NULL, PTR1, MPIerr)
~  CALL MPI_ALLOC_MEM( MEM_Y, MPI_INFO_NULL, PTR2, MPIerr)
C
C fill vectors with 0.0D0 and 1.0D0
C
~  CALL DZERO(XMAT,NELEMENT_X)
~  CALL DONE(YMAT,NELEMENT_Y)
C
C open memory window
C
~  CALL MPI_WIN_CREATE( XMAT, MEM_X, ISIZE_REAL8,
~ & MPI_INFO_NULL, MPI_COMM_WORLD,
~ & MY_X_WIN, MPIerr )
C lock window (MPI_LOCK_SHARED mode)
C select target ==> if itarget == myid: no 1-sided communication
C
~  ITARGET = MYID
~  CALL MPI_WIN_LOCK( MPI_LOCK_SHARED, ITARGET, MPI_MODE_NOCHECK,
~ &   MY_X_WIN, MPIerr)
C
C transfer data to target ITARGET
C
~  JCOUNT_T = 10 000
~  JCOUNT   = JCOUNT_T
C set displacement in memory window
~  IDISPL_WIN = 300 0

Re: [OMPI users] openmpi credits for eager messages

2008-02-07 Thread Jeff Squyres

What I missed in this whole conversation is that the pieces of text  
that Ron and Dick are citing are *on the same page* in the MPI spec;  
they're not disparate parts of the spec that accidentally overlap in  
discussion scope.

Specifically, it says:

   Resource limitations

   Any pending communication operation consumes system resources that  
are
   limited. Errors may occur when lack of resources prevent the  
execution
   of an MPI call. A quality implementation will use a (small) fixed  
amount
   of resources for each pending send in the ready or synchronous  
mode and
   for each pending receive. However, buffer space may be consumed to  
store
   messages sent in standard mode, and must be consumed to store  
messages
   sent in buffered mode, when no matching receive is available. The  
amount
   of space available for buffering will be much smaller than program  
data

   memory on many systems. Then, it will be easy to write programs that
   overrun available buffer space.
...12 lines down on that page, on the same page, in the same section...
   Consider a situation where a producer repeatedly produces new values
   and sends them to a consumer. Assume that the producer produces new
   values faster than the consumer can consume them.
...skip 2 sentences about buffered sends...
   If standard sends are used, then the producer will be automatically
   throttled, as its send operations will block when buffer space is
   unavailable.

I find that to be unambiguous.

1. A loop of MPI_ISENDs on a producer can cause a malloc failure  
(can't malloc a new MPI_Request), and that's an error.  Tough luck.

2. A loop of MPI_SENDs on a producer can run a slow-but-MPI-active  
consumer out of buffer space if all the incoming messages are queued  
up (e.g., in the unexpected queue).  The language above is pretty  
clear about this: MPI_SEND on the producer is supposed to block at  
this point.

FWIW: Open MPI does support this mode of operation, as George and Gleb  
noted (by setting the eager size to 0, therefore forcing *all* sends  
to be synchronous -- a producer cannot "run ahead" for a while and  
eventually be throttled when receive buffering is exhausted), but a)  
it's not the default, and b) it's not documented this way.

On Feb 4, 2008, at 1:29 PM, Richard Treumann wrote:

Hi Ron -

I am well aware of the scaling problems related to the standard send  
requirements in MPI. I t is a very difficult issue.

However, here is what the standard says: MPI 1.2, page 32 lines 29-37

===
a standard send operation that cannot complete because of lack of  
buffer space will merely block, waiting for buffer space to become  
available or for a matching receive to be posted. This behavior is  
preferable in many situations. Consider a situation where a producer  
repeatedly produces new values and sends them to a consumer. Assume  
that the producer produces new values faster than the consumer can  
consume them. If buffered sends are used, then a buffer overflow  
will result. Additional synchronization has to be added to the  
program so as to prevent this from occurring. If standard sends are  
used, then the producer will be
automatically throttled, as its send operations will block when  
buffer space is unavailable.

If there are people who want to argue that this is unclear or that  
it should be changed, the MPI Forum can and should take up the  
discussion. I think this particular wording is pretty clear.

The piece of MPI standard wording you quote is somewhat ambiguous:

The amount
of space available for buffering will be much smaller than program  
data

memory on many systems. Then, it will be easy to write programs that
overrun available buffer space.

But note that this wording mentions a problem that an application  
can create but does not say the MPI implementation can fail the job.  
The language I have pointed to is where the standard says what the  
MPI implementation must do.

The "lack of resource" statement is more about send and receive  
descriptors than buffer space. If I write a program with an infinite  
loop of MPI_IRECV postings the standard allows that to fail.

Dick

Dick Treumann - MPI Team/TCEM
IBM Systems & Technology Group
Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363

users-boun...@open-mpi.org wrote on 02/04/2008 12:24:11 PM:

>
> > Is what George says accurate? If so, it sounds to me like OpenMPI
> > does not comply with the MPI standard on the behavior of eager
> > protocol. MPICH is getting dinged in this discussion because they
> > have complied with the requirements of the MPI standard. IBM MPI
> > also complies with the standard.
> >
> > If there is any debate about whether the MPI standard does (or
> > should) require the behavior I describe below then we should move
> > the discussion to the MPI 2.1 Forum and get a clarification.
> > [...]
>
> The MPI S

Re: [OMPI users] Can't compile C++ program with extern "C" { #include mpi.h }

2008-02-07 Thread Adam C Powell IV

On Wed, 2008-01-30 at 21:21 -0500, Jeff Squyres wrote:
> On Jan 30, 2008, at 5:35 PM, Adam C Powell IV wrote:
> 
> > With no reply in a couple of weeks, I'm wondering if my previous  
> > message
> > got dropped.  (Then again, my previous message was a couple of weeks
> > late in replying to its predecessor...)
> 
> No, it didn't get dropped -- it was exactly your admission of low  
> priority that had me put this issue as low priority as well.  :-)

I understand.

> > I'm recommending a change to mpi.h which would let C headers  
> > included by
> > C++ programs do:
> > #define OMPI_SKIP_MPICXX
> > #include 
> > #undef OMPI_SKIP_MPICXX
> > without preventing the C++ headers from being included at another  
> > time.
> > See below for the recommended change.
> 
> I really don't think that's a good solution.  The real problem is that  
> some of Salome's header files are doing things that they should not be  
> doing (including a C++-safe header file inside extern "C" {}).
> 
> IMHO, the real fix should be to fix the code that is doing the Wrong  
> Thing.  I'm reluctant to provide a subtle workaround in our software  
> that enables a Wrong Thing -- know what I mean?
> 
> FWIW, I just downloaded HDF5 1.6.6 and I took a [quick] look: it does  
> indeed look like HDF5's header files are C++-safe.  Specifically: they  
> do not include  in an extern "C" block, and all of their  
> declarations are within extern "C" blocks.  Hence, Salome should not  
> be including  inside of an extern "C" block because   
> is already C++-safe.
> 
> This should fix your problem, right?

Sort of, though it will require a large patch to Salomé to get around
this, vs. a small patch to OpenMPI to provide a simple workaround to
this class a problems.  Basically, I'll need to patch every .hh file to
make sure it #includes mpi.h or hdf5.h before #including any .h file
with an mpi.h or hdf5.h #include in it.

Given that Salomé does this, it must have worked with another MPI
implementation.  And that means that there is likely other software
which will try this.

As I understand it, your only objection to the change now is "programs
shouldn't be doing this", not "this will break something".

But then, why wouldn't programs expect to be able to include C headers
in a C++ extern C block?  Or rather, why shouldn't they be able to do so
with mpi.h -- or hdf5.h, which isn't mpi.h -- when numerous other C
header files allow it, possibly including other MPI implementations?
After all, it's called mpi.h not mpi.hh or .hxx or mpi_cxx.h, right?
And isn't the patched version cleaner, in that it separates the C and
C++ prototypes into different #ifdef/#define regions?

Thanks for the reply, sorry about the delay in getting back to you.

-Adam
-- 
GPG fingerprint: D54D 1AEE B11C CE9B A02B  C5DD 526F 01E8 564E E4B6

Engineering consulting with open source tools
http://www.opennovation.com/

[OMPI users] Oversubscribing a subset of a machine's cores

Re: [OMPI users] MPI_CART_CREATE and Fortran 90 Interface

Re: [OMPI users] mpirun, paths and xterm again (xserver problem solved; library problem still there)

Re: [OMPI users] Infinipath context limit

Re: [OMPI users] Oversubscribing a subset of a machine's cores

Re: [OMPI users] mpirun, paths and xterm again (xserver problem solved; library problem still there)

Re: [OMPI users] mpirun, paths and xterm again (xserver problem solved; library problem still there)

Re: [OMPI users] bug in MPI_ACCUMULATE for window offsets > 2**31 - 1 bytes? openmpi v1.2.5

Re: [OMPI users] process placement with toruqe and OpenMP

Re: [OMPI users] Bad behavior in Allgatherv when a count is 0

Re: [OMPI users] bug in MPI_ACCUMULATE for window offsets > 2**31 - 1 bytes? openmpi v1.2.5

Re: [OMPI users] openmpi credits for eager messages

Re: [OMPI users] Can't compile C++ program with extern "C" { #include mpi.h }

13 matches

Site Navigation

Mail list logo

Footer information