I forget to add this :

Try

qconf -mconf
  ...
 execd_params H_MEMORYLOCKED=infinity
 ...

 Then restart all your sgeexecd hosts.



On Thu, Mar 15, 2012 at 1:37 PM, Mazouzi <[email protected]> wrote:

> I looks like an openmpi error. Check this link :
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
>
>
>
>
> On Thu, Mar 15, 2012 at 1:11 PM, Reuti <[email protected]> wrote:
>
>> Am 15.03.2012 um 11:30 schrieb Mohamed Adel:
>>
>> > Dear all,
>> >
>> > I was trying to run a simple mpi script via qsub, then I received the
>> error below after which the job run correctly!
>> > I received no error when I tried to run the same script directly
>> without qsub.
>> > Is there a way to fix this error message?
>>
>> To me it doesn't look like error messages from SGE. You could avoid "#$
>> -j   y" and they should end up in a separate file.
>>
>> >  Thanks in advance,
>> > madel
>> >
>> > job script:
>> > #$ -cwd
>> > #$ -j   y
>> > #$ -N   hello-mpi
>> > #$ -o   $JOB_NAME.o$JOB_ID
>> > #$ -pe  impi    16
>> > mpirun --rsh=ssh -np 16 ./hello.bin
>>
>> Are you using a tight integration, or is it an SSH startup outside of
>> SGE's control?
>>
>> -- Reuti
>>
>>
>> > ============================================
>> >
>> > Error Message:
>> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>> >     This will severely limit memory registrations.
>> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>> >     This will severely limit memory registrations.
>> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>> >     This will severely limit memory registrations.
>> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>> >     This will severely limit memory registrations.
>> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>> >     This will severely limit memory registrations.
>> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>> >     This will severely limit memory registrations.
>> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>> >     This will severely limit memory registrations.
>> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
>> >     This will severely limit memory registrations.
>> > comp040.local:5827:  create_cq Cannot allocate memory
>> > comp040.local:5826:  create_cq Cannot allocate memory
>> > comp040.local:5829:  create_cq Cannot allocate memory
>> > comp040.local:5833:  create_cq Cannot allocate memory
>> > comp040.local:5828:  create_cq Cannot allocate memory
>> > comp040.local:5831:  create_cq Cannot allocate memory
>> > comp040.local:5832:  create_cq Cannot allocate memory
>> > comp040.local:5830:  create_cq Cannot allocate memory
>> > comp040.local:5832:  open_hca: getaddr_netdev ERROR: No such file or
>> directory. Is ib1 configured?
>> > comp040.local:5832: dapls_ib_open_hca failed 120000
>> > comp040.local:5827:  open_hca: getaddr_netdev ERROR: No such file or
>> directory. Is ib1 configured?
>> > comp040.local:5827: dapls_ib_open_hca failed 120000
>> > comp040.local:5832:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib2 configured?
>> > comp040.local:5832: dapls_ib_open_hca failed 120000
>> > comp040.local:5829:  open_hca: getaddr_netdev ERROR: No such file or
>> directory. Is ib1 configured?
>> > comp040.local:5829: dapls_ib_open_hca failed 120000
>> > comp040.local:5827:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib2 configured?
>> > comp040.local:5827: dapls_ib_open_hca failed 120000
>> > comp040.local:5832:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib3 configured?
>> > comp040.local:5832: dapls_ib_open_hca failed 120000
>> > comp040.local:5829:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib2 configured?
>> > comp040.local:5829: dapls_ib_open_hca failed 120000
>> > comp040.local:5826:  open_hca: getaddr_netdev ERROR: No such file or
>> directory. Is ib1 configured?
>> > comp040.local:5826: dapls_ib_open_hca failed 120000
>> > comp040.local:5827:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib3 configured?
>> > comp040.local:5827: dapls_ib_open_hca failed 120000
>> > comp040.local:5828:  open_hca: getaddr_netdev ERROR: No such file or
>> directory. Is ib1 configured?
>> > comp040.local:5828: comp040.local:5832:  open_hca: getaddr_netdev
>> ERROR: No such device. Is bond0 configured?
>> > comp040.local:5832: dapls_ib_open_hca failed 120000
>> > dapls_ib_open_hca failed 120000
>> > comp040.local:5829:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib3 configured?
>> > comp040.local:5829: dapls_ib_open_hca failed 120000
>> > comp040.local:5826:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib2 configured?
>> > comp040.local:5826: dapls_ib_open_hca failed 120000
>> > comp040.local:5833:  open_hca: getaddr_netdev ERROR: No such file or
>> directory. Is ib1 configured?
>> > comp040.local:5833: dapls_ib_open_hca failed 120000
>> > comp040.local:5831:  open_hca: getaddr_netdev ERROR: No such file or
>> directory. Is ib1 configured?
>> > comp040.local:5831: dapls_ib_open_hca failed 120000
>> > comp040.local:5827:  open_hca: getaddr_netdev ERROR: No such device. Is
>> bond0 configured?
>> > comp040.local:5827: dapls_ib_open_hca failed 120000
>> > comp040.local:5830:  open_hca: getaddr_netdev ERROR: No such file or
>> directory. Is ib1 configured?
>> > comp040.local:5830: dapls_ib_open_hca failed 120000
>> > comp040.local:5833:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib2 configured?
>> > comp040.local:5833: dapls_ib_open_hca failed 120000
>> > comp040.local:5829:  open_hca: getaddr_netdev ERROR: No such device. Is
>> bond0 configured?
>> > comp040.local:5829: dapls_ib_open_hca failed 120000
>> > comp040.local:5826:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib3 configured?
>> > comp040.local:5826: dapls_ib_open_hca failed 120000
>> > comp040.local:5828:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib2 configured?
>> > comp040.local:5828: dapls_ib_open_hca failed 120000
>> > comp040.local:5831:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib2 configured?
>> > comp040.local:5831: dapls_ib_open_hca failed 120000
>> > comp040.local:5833:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib3 configured?
>> > comp040.local:5833: dapls_ib_open_hca failed 120000
>> > comp040.local:5830:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib2 configured?
>> > comp040.local:5830: dapls_ib_open_hca failed 120000
>> > comp040.local:5826:  open_hca: getaddr_netdev ERROR: No such device. Is
>> bond0 configured?
>> > comp040.local:5826: dapls_ib_open_hca failed 120000
>> > comp040.local:5828:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib3 configured?
>> > comp040.local:5828: dapls_ib_open_hca failed 120000
>> > comp040.local:5831:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib3 configured?
>> > comp040.local:5831: dapls_ib_open_hca failed 120000
>> > comp040.local:5833:  open_hca: getaddr_netdev ERROR: No such device. Is
>> bond0 configured?
>> > comp040.local:5833: dapls_ib_open_hca failed 120000
>> > comp040.local:5828:  open_hca: getaddr_netdev ERROR: No such device. Is
>> bond0 configured?
>> > comp040.local:5828: dapls_ib_open_hca failed 120000
>> > comp040.local:5831:  open_hca: getaddr_netdev ERROR: No such device. Is
>> bond0 configured?
>> > comp040.local:5831: dapls_ib_open_hca failed 120000
>> > comp040.local:5830:  open_hca: getaddr_netdev ERROR: No such device. Is
>> ib3 configured?
>> > comp040.local:5830: dapls_ib_open_hca failed 120000
>> > comp040.local:5830:  open_hca: getaddr_netdev ERROR: No such device. Is
>> bond0 configured?
>> > comp040.local:5830: dapls_ib_open_hca failed 120000
>> > Hello world: rank 8 of 16 running on comp047.local
>> > Hello world: rank 13 of 16 running on comp047.local
>> > Hello world: rank 10 of 16 running on comp047.local
>> > Hello world: rank 11 of 16 running on comp047.local
>> > Hello world: rank 15 of 16 running on comp047.local
>> > Hello world: rank 9 of 16 running on comp047.local
>> > Hello world: rank 12 of 16 running on comp047.local
>> > Hello world: rank 7 of 16 running on comp040.local
>> > Hello world: rank 14 of 16 running on comp047.local
>> > Hello world: rank 6 of 16 running on comp040.local
>> > Hello world: rank 0 of 16 running on comp040.local
>> > Hello world: rank 4 of 16 running on comp040.local
>> > Hello world: rank 2 of 16 running on comp040.local
>> > Hello world: rank 5 of 16 running on comp040.local
>> > Hello world: rank 1 of 16 running on comp040.local
>> > Hello world: rank 3 of 16 running on comp040.local
>> >
>> > _______________________________________________
>> > users mailing list
>> > [email protected]
>> > https://gridengine.org/mailman/listinfo/users
>>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>>
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to