Hello,

I've wanted to run MPI on a couple of PS3 here. According to a colleague who 
set it up, I had to set several HugePages. As the PS3 RAM is limited I had to 
allocate 2 HugePages. I ran HPL at first with the following command (out of a 
tutorial):
mpirun --mca btl_openib_want_fork_support 0 -np 1 numactl --physcpubind=0 
./xhpl : -np 1 numactl --physcpubind=1 ./xhpl

Now as I had very little memory I had to disable some services. I did so (Wifi 
Service, Bluetooth, printing, unneeded). After running the same command again, 
I got the an error message (see below). Can anyone help me here, I have no idea 
what the error message actually means, and I can't find anything useful about 
it. It's running on Yellow Dog Linux, using OpenMPI 1.2.8

Cheers,
Andreas Hoelzlwimmer

Error Message:
[PS02:04815] *** Process received signal ***
[PS02:04815] Signal: Bus error (7)
[PS02:04815] Signal code:  (2)
[PS02:04815] Failing at address: 0x4000ca78008
[PS02:04816] *** Process received signal ***
[PS02:04816] Signal: Bus error (7)
[PS02:04816] Signal code:  (2)
[PS02:04816] Failing at address: 0x4000ca78008
[PS02:04816] [ 0] [0x1003e8]
[PS02:04816] [ 1] ./xhpl(HPL_hpalloc-0x17cc8c) [0x1001103c]
[PS02:04816] [ 2] ./xhpl(HPL_pdtest-0x17da40) [0x100101f8]
[PS02:04816] [ 3] ./xhpl(main-0x182f2c) [0x1000acdc]
[PS02:04816] [ 4] /lib64/libc.so.6 [0x80ca0e966c]
[PS02:04816] [ 5] /lib64/libc.so.6(__libc_start_main-0x1473e0) [0x80ca0e98e8]
[PS02:04816] *** End of error message ***
[PS02:04815] [ 0] [0x1003e8]
[PS02:04815] [ 1] ./xhpl(HPL_hpalloc-0x17cc8c) [0x1001103c]
[PS02:04815] [ 2] ./xhpl(HPL_pdtest-0x17da40) [0x100101f8]
[PS02:04815] [ 3] ./xhpl(main-0x182f2c) [0x1000acdc]
[PS02:04815] [ 4] /lib64/libc.so.6 [0x80ca0e966c]
[PS02:04815] [ 5] /lib64/libc.so.6(__libc_start_main-0x1473e0) [0x80ca0e98e8]
[PS02:04815] *** End of error message ***
mpirun noticed that job rank 0 with PID 4815 on node PS02 exited on signal 7 
(Bus error).
1 additional process aborted (not shown)

Reply via email to