Am 15.03.2012 um 11:30 schrieb Mohamed Adel:

> Dear all,
>  
> I was trying to run a simple mpi script via qsub, then I received the error 
> below after which the job run correctly!
> I received no error when I tried to run the same script directly without qsub.
> Is there a way to fix this error message?

To me it doesn't look like error messages from SGE. You could avoid "#$ -j   y" 
and they should end up in a separate file.

>  Thanks in advance,
> madel
>  
> job script:
> #$ -cwd
> #$ -j   y
> #$ -N   hello-mpi
> #$ -o   $JOB_NAME.o$JOB_ID
> #$ -pe  impi    16
> mpirun --rsh=ssh -np 16 ./hello.bin

Are you using a tight integration, or is it an SSH startup outside of SGE's 
control?

-- Reuti


> ============================================
>  
> Error Message:
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>     This will severely limit memory registrations.
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>     This will severely limit memory registrations.
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>     This will severely limit memory registrations.
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>     This will severely limit memory registrations.
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>     This will severely limit memory registrations.
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>     This will severely limit memory registrations.
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>     This will severely limit memory registrations.
> libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>     This will severely limit memory registrations.
> comp040.local:5827:  create_cq Cannot allocate memory
> comp040.local:5826:  create_cq Cannot allocate memory
> comp040.local:5829:  create_cq Cannot allocate memory
> comp040.local:5833:  create_cq Cannot allocate memory
> comp040.local:5828:  create_cq Cannot allocate memory
> comp040.local:5831:  create_cq Cannot allocate memory
> comp040.local:5832:  create_cq Cannot allocate memory
> comp040.local:5830:  create_cq Cannot allocate memory
> comp040.local:5832:  open_hca: getaddr_netdev ERROR: No such file or 
> directory. Is ib1 configured?
> comp040.local:5832: dapls_ib_open_hca failed 120000
> comp040.local:5827:  open_hca: getaddr_netdev ERROR: No such file or 
> directory. Is ib1 configured?
> comp040.local:5827: dapls_ib_open_hca failed 120000
> comp040.local:5832:  open_hca: getaddr_netdev ERROR: No such device. Is ib2 
> configured?
> comp040.local:5832: dapls_ib_open_hca failed 120000
> comp040.local:5829:  open_hca: getaddr_netdev ERROR: No such file or 
> directory. Is ib1 configured?
> comp040.local:5829: dapls_ib_open_hca failed 120000
> comp040.local:5827:  open_hca: getaddr_netdev ERROR: No such device. Is ib2 
> configured?
> comp040.local:5827: dapls_ib_open_hca failed 120000
> comp040.local:5832:  open_hca: getaddr_netdev ERROR: No such device. Is ib3 
> configured?
> comp040.local:5832: dapls_ib_open_hca failed 120000
> comp040.local:5829:  open_hca: getaddr_netdev ERROR: No such device. Is ib2 
> configured?
> comp040.local:5829: dapls_ib_open_hca failed 120000
> comp040.local:5826:  open_hca: getaddr_netdev ERROR: No such file or 
> directory. Is ib1 configured?
> comp040.local:5826: dapls_ib_open_hca failed 120000
> comp040.local:5827:  open_hca: getaddr_netdev ERROR: No such device. Is ib3 
> configured?
> comp040.local:5827: dapls_ib_open_hca failed 120000
> comp040.local:5828:  open_hca: getaddr_netdev ERROR: No such file or 
> directory. Is ib1 configured?
> comp040.local:5828: comp040.local:5832:  open_hca: getaddr_netdev ERROR: No 
> such device. Is bond0 configured?
> comp040.local:5832: dapls_ib_open_hca failed 120000
> dapls_ib_open_hca failed 120000
> comp040.local:5829:  open_hca: getaddr_netdev ERROR: No such device. Is ib3 
> configured?
> comp040.local:5829: dapls_ib_open_hca failed 120000
> comp040.local:5826:  open_hca: getaddr_netdev ERROR: No such device. Is ib2 
> configured?
> comp040.local:5826: dapls_ib_open_hca failed 120000
> comp040.local:5833:  open_hca: getaddr_netdev ERROR: No such file or 
> directory. Is ib1 configured?
> comp040.local:5833: dapls_ib_open_hca failed 120000
> comp040.local:5831:  open_hca: getaddr_netdev ERROR: No such file or 
> directory. Is ib1 configured?
> comp040.local:5831: dapls_ib_open_hca failed 120000
> comp040.local:5827:  open_hca: getaddr_netdev ERROR: No such device. Is bond0 
> configured?
> comp040.local:5827: dapls_ib_open_hca failed 120000
> comp040.local:5830:  open_hca: getaddr_netdev ERROR: No such file or 
> directory. Is ib1 configured?
> comp040.local:5830: dapls_ib_open_hca failed 120000
> comp040.local:5833:  open_hca: getaddr_netdev ERROR: No such device. Is ib2 
> configured?
> comp040.local:5833: dapls_ib_open_hca failed 120000
> comp040.local:5829:  open_hca: getaddr_netdev ERROR: No such device. Is bond0 
> configured?
> comp040.local:5829: dapls_ib_open_hca failed 120000
> comp040.local:5826:  open_hca: getaddr_netdev ERROR: No such device. Is ib3 
> configured?
> comp040.local:5826: dapls_ib_open_hca failed 120000
> comp040.local:5828:  open_hca: getaddr_netdev ERROR: No such device. Is ib2 
> configured?
> comp040.local:5828: dapls_ib_open_hca failed 120000
> comp040.local:5831:  open_hca: getaddr_netdev ERROR: No such device. Is ib2 
> configured?
> comp040.local:5831: dapls_ib_open_hca failed 120000
> comp040.local:5833:  open_hca: getaddr_netdev ERROR: No such device. Is ib3 
> configured?
> comp040.local:5833: dapls_ib_open_hca failed 120000
> comp040.local:5830:  open_hca: getaddr_netdev ERROR: No such device. Is ib2 
> configured?
> comp040.local:5830: dapls_ib_open_hca failed 120000
> comp040.local:5826:  open_hca: getaddr_netdev ERROR: No such device. Is bond0 
> configured?
> comp040.local:5826: dapls_ib_open_hca failed 120000
> comp040.local:5828:  open_hca: getaddr_netdev ERROR: No such device. Is ib3 
> configured?
> comp040.local:5828: dapls_ib_open_hca failed 120000
> comp040.local:5831:  open_hca: getaddr_netdev ERROR: No such device. Is ib3 
> configured?
> comp040.local:5831: dapls_ib_open_hca failed 120000
> comp040.local:5833:  open_hca: getaddr_netdev ERROR: No such device. Is bond0 
> configured?
> comp040.local:5833: dapls_ib_open_hca failed 120000
> comp040.local:5828:  open_hca: getaddr_netdev ERROR: No such device. Is bond0 
> configured?
> comp040.local:5828: dapls_ib_open_hca failed 120000
> comp040.local:5831:  open_hca: getaddr_netdev ERROR: No such device. Is bond0 
> configured?
> comp040.local:5831: dapls_ib_open_hca failed 120000
> comp040.local:5830:  open_hca: getaddr_netdev ERROR: No such device. Is ib3 
> configured?
> comp040.local:5830: dapls_ib_open_hca failed 120000
> comp040.local:5830:  open_hca: getaddr_netdev ERROR: No such device. Is bond0 
> configured?
> comp040.local:5830: dapls_ib_open_hca failed 120000
> Hello world: rank 8 of 16 running on comp047.local
> Hello world: rank 13 of 16 running on comp047.local
> Hello world: rank 10 of 16 running on comp047.local
> Hello world: rank 11 of 16 running on comp047.local
> Hello world: rank 15 of 16 running on comp047.local
> Hello world: rank 9 of 16 running on comp047.local
> Hello world: rank 12 of 16 running on comp047.local
> Hello world: rank 7 of 16 running on comp040.local
> Hello world: rank 14 of 16 running on comp047.local
> Hello world: rank 6 of 16 running on comp040.local
> Hello world: rank 0 of 16 running on comp040.local
> Hello world: rank 4 of 16 running on comp040.local
> Hello world: rank 2 of 16 running on comp040.local
> Hello world: rank 5 of 16 running on comp040.local
> Hello world: rank 1 of 16 running on comp040.local
> Hello world: rank 3 of 16 running on comp040.local
>  
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to