I forget to add this : Try
qconf -mconf ... execd_params H_MEMORYLOCKED=infinity ... Then restart all your sgeexecd hosts. On Thu, Mar 15, 2012 at 1:37 PM, Mazouzi <[email protected]> wrote: > I looks like an openmpi error. Check this link : > > http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages > > > > > On Thu, Mar 15, 2012 at 1:11 PM, Reuti <[email protected]> wrote: > >> Am 15.03.2012 um 11:30 schrieb Mohamed Adel: >> >> > Dear all, >> > >> > I was trying to run a simple mpi script via qsub, then I received the >> error below after which the job run correctly! >> > I received no error when I tried to run the same script directly >> without qsub. >> > Is there a way to fix this error message? >> >> To me it doesn't look like error messages from SGE. You could avoid "#$ >> -j y" and they should end up in a separate file. >> >> > Thanks in advance, >> > madel >> > >> > job script: >> > #$ -cwd >> > #$ -j y >> > #$ -N hello-mpi >> > #$ -o $JOB_NAME.o$JOB_ID >> > #$ -pe impi 16 >> > mpirun --rsh=ssh -np 16 ./hello.bin >> >> Are you using a tight integration, or is it an SSH startup outside of >> SGE's control? >> >> -- Reuti >> >> >> > ============================================ >> > >> > Error Message: >> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. >> > This will severely limit memory registrations. >> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. >> > This will severely limit memory registrations. >> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. >> > This will severely limit memory registrations. >> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. >> > This will severely limit memory registrations. >> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. >> > This will severely limit memory registrations. >> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. >> > This will severely limit memory registrations. >> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. >> > This will severely limit memory registrations. >> > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. >> > This will severely limit memory registrations. >> > comp040.local:5827: create_cq Cannot allocate memory >> > comp040.local:5826: create_cq Cannot allocate memory >> > comp040.local:5829: create_cq Cannot allocate memory >> > comp040.local:5833: create_cq Cannot allocate memory >> > comp040.local:5828: create_cq Cannot allocate memory >> > comp040.local:5831: create_cq Cannot allocate memory >> > comp040.local:5832: create_cq Cannot allocate memory >> > comp040.local:5830: create_cq Cannot allocate memory >> > comp040.local:5832: open_hca: getaddr_netdev ERROR: No such file or >> directory. Is ib1 configured? >> > comp040.local:5832: dapls_ib_open_hca failed 120000 >> > comp040.local:5827: open_hca: getaddr_netdev ERROR: No such file or >> directory. Is ib1 configured? >> > comp040.local:5827: dapls_ib_open_hca failed 120000 >> > comp040.local:5832: open_hca: getaddr_netdev ERROR: No such device. Is >> ib2 configured? >> > comp040.local:5832: dapls_ib_open_hca failed 120000 >> > comp040.local:5829: open_hca: getaddr_netdev ERROR: No such file or >> directory. Is ib1 configured? >> > comp040.local:5829: dapls_ib_open_hca failed 120000 >> > comp040.local:5827: open_hca: getaddr_netdev ERROR: No such device. Is >> ib2 configured? >> > comp040.local:5827: dapls_ib_open_hca failed 120000 >> > comp040.local:5832: open_hca: getaddr_netdev ERROR: No such device. Is >> ib3 configured? >> > comp040.local:5832: dapls_ib_open_hca failed 120000 >> > comp040.local:5829: open_hca: getaddr_netdev ERROR: No such device. Is >> ib2 configured? >> > comp040.local:5829: dapls_ib_open_hca failed 120000 >> > comp040.local:5826: open_hca: getaddr_netdev ERROR: No such file or >> directory. Is ib1 configured? >> > comp040.local:5826: dapls_ib_open_hca failed 120000 >> > comp040.local:5827: open_hca: getaddr_netdev ERROR: No such device. Is >> ib3 configured? >> > comp040.local:5827: dapls_ib_open_hca failed 120000 >> > comp040.local:5828: open_hca: getaddr_netdev ERROR: No such file or >> directory. Is ib1 configured? >> > comp040.local:5828: comp040.local:5832: open_hca: getaddr_netdev >> ERROR: No such device. Is bond0 configured? >> > comp040.local:5832: dapls_ib_open_hca failed 120000 >> > dapls_ib_open_hca failed 120000 >> > comp040.local:5829: open_hca: getaddr_netdev ERROR: No such device. Is >> ib3 configured? >> > comp040.local:5829: dapls_ib_open_hca failed 120000 >> > comp040.local:5826: open_hca: getaddr_netdev ERROR: No such device. Is >> ib2 configured? >> > comp040.local:5826: dapls_ib_open_hca failed 120000 >> > comp040.local:5833: open_hca: getaddr_netdev ERROR: No such file or >> directory. Is ib1 configured? >> > comp040.local:5833: dapls_ib_open_hca failed 120000 >> > comp040.local:5831: open_hca: getaddr_netdev ERROR: No such file or >> directory. Is ib1 configured? >> > comp040.local:5831: dapls_ib_open_hca failed 120000 >> > comp040.local:5827: open_hca: getaddr_netdev ERROR: No such device. Is >> bond0 configured? >> > comp040.local:5827: dapls_ib_open_hca failed 120000 >> > comp040.local:5830: open_hca: getaddr_netdev ERROR: No such file or >> directory. Is ib1 configured? >> > comp040.local:5830: dapls_ib_open_hca failed 120000 >> > comp040.local:5833: open_hca: getaddr_netdev ERROR: No such device. Is >> ib2 configured? >> > comp040.local:5833: dapls_ib_open_hca failed 120000 >> > comp040.local:5829: open_hca: getaddr_netdev ERROR: No such device. Is >> bond0 configured? >> > comp040.local:5829: dapls_ib_open_hca failed 120000 >> > comp040.local:5826: open_hca: getaddr_netdev ERROR: No such device. Is >> ib3 configured? >> > comp040.local:5826: dapls_ib_open_hca failed 120000 >> > comp040.local:5828: open_hca: getaddr_netdev ERROR: No such device. Is >> ib2 configured? >> > comp040.local:5828: dapls_ib_open_hca failed 120000 >> > comp040.local:5831: open_hca: getaddr_netdev ERROR: No such device. Is >> ib2 configured? >> > comp040.local:5831: dapls_ib_open_hca failed 120000 >> > comp040.local:5833: open_hca: getaddr_netdev ERROR: No such device. Is >> ib3 configured? >> > comp040.local:5833: dapls_ib_open_hca failed 120000 >> > comp040.local:5830: open_hca: getaddr_netdev ERROR: No such device. Is >> ib2 configured? >> > comp040.local:5830: dapls_ib_open_hca failed 120000 >> > comp040.local:5826: open_hca: getaddr_netdev ERROR: No such device. Is >> bond0 configured? >> > comp040.local:5826: dapls_ib_open_hca failed 120000 >> > comp040.local:5828: open_hca: getaddr_netdev ERROR: No such device. Is >> ib3 configured? >> > comp040.local:5828: dapls_ib_open_hca failed 120000 >> > comp040.local:5831: open_hca: getaddr_netdev ERROR: No such device. Is >> ib3 configured? >> > comp040.local:5831: dapls_ib_open_hca failed 120000 >> > comp040.local:5833: open_hca: getaddr_netdev ERROR: No such device. Is >> bond0 configured? >> > comp040.local:5833: dapls_ib_open_hca failed 120000 >> > comp040.local:5828: open_hca: getaddr_netdev ERROR: No such device. Is >> bond0 configured? >> > comp040.local:5828: dapls_ib_open_hca failed 120000 >> > comp040.local:5831: open_hca: getaddr_netdev ERROR: No such device. Is >> bond0 configured? >> > comp040.local:5831: dapls_ib_open_hca failed 120000 >> > comp040.local:5830: open_hca: getaddr_netdev ERROR: No such device. Is >> ib3 configured? >> > comp040.local:5830: dapls_ib_open_hca failed 120000 >> > comp040.local:5830: open_hca: getaddr_netdev ERROR: No such device. Is >> bond0 configured? >> > comp040.local:5830: dapls_ib_open_hca failed 120000 >> > Hello world: rank 8 of 16 running on comp047.local >> > Hello world: rank 13 of 16 running on comp047.local >> > Hello world: rank 10 of 16 running on comp047.local >> > Hello world: rank 11 of 16 running on comp047.local >> > Hello world: rank 15 of 16 running on comp047.local >> > Hello world: rank 9 of 16 running on comp047.local >> > Hello world: rank 12 of 16 running on comp047.local >> > Hello world: rank 7 of 16 running on comp040.local >> > Hello world: rank 14 of 16 running on comp047.local >> > Hello world: rank 6 of 16 running on comp040.local >> > Hello world: rank 0 of 16 running on comp040.local >> > Hello world: rank 4 of 16 running on comp040.local >> > Hello world: rank 2 of 16 running on comp040.local >> > Hello world: rank 5 of 16 running on comp040.local >> > Hello world: rank 1 of 16 running on comp040.local >> > Hello world: rank 3 of 16 running on comp040.local >> > >> > _______________________________________________ >> > users mailing list >> > [email protected] >> > https://gridengine.org/mailman/listinfo/users >> >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> > >
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
