Am 15.03.2012 um 11:30 schrieb Mohamed Adel: > Dear all, > > I was trying to run a simple mpi script via qsub, then I received the error > below after which the job run correctly! > I received no error when I tried to run the same script directly without qsub. > Is there a way to fix this error message?
To me it doesn't look like error messages from SGE. You could avoid "#$ -j y" and they should end up in a separate file. > Thanks in advance, > madel > > job script: > #$ -cwd > #$ -j y > #$ -N hello-mpi > #$ -o $JOB_NAME.o$JOB_ID > #$ -pe impi 16 > mpirun --rsh=ssh -np 16 ./hello.bin Are you using a tight integration, or is it an SSH startup outside of SGE's control? -- Reuti > ============================================ > > Error Message: > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > This will severely limit memory registrations. > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > This will severely limit memory registrations. > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > This will severely limit memory registrations. > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > This will severely limit memory registrations. > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > This will severely limit memory registrations. > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > This will severely limit memory registrations. > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > This will severely limit memory registrations. > libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes. > This will severely limit memory registrations. > comp040.local:5827: create_cq Cannot allocate memory > comp040.local:5826: create_cq Cannot allocate memory > comp040.local:5829: create_cq Cannot allocate memory > comp040.local:5833: create_cq Cannot allocate memory > comp040.local:5828: create_cq Cannot allocate memory > comp040.local:5831: create_cq Cannot allocate memory > comp040.local:5832: create_cq Cannot allocate memory > comp040.local:5830: create_cq Cannot allocate memory > comp040.local:5832: open_hca: getaddr_netdev ERROR: No such file or > directory. Is ib1 configured? > comp040.local:5832: dapls_ib_open_hca failed 120000 > comp040.local:5827: open_hca: getaddr_netdev ERROR: No such file or > directory. Is ib1 configured? > comp040.local:5827: dapls_ib_open_hca failed 120000 > comp040.local:5832: open_hca: getaddr_netdev ERROR: No such device. Is ib2 > configured? > comp040.local:5832: dapls_ib_open_hca failed 120000 > comp040.local:5829: open_hca: getaddr_netdev ERROR: No such file or > directory. Is ib1 configured? > comp040.local:5829: dapls_ib_open_hca failed 120000 > comp040.local:5827: open_hca: getaddr_netdev ERROR: No such device. Is ib2 > configured? > comp040.local:5827: dapls_ib_open_hca failed 120000 > comp040.local:5832: open_hca: getaddr_netdev ERROR: No such device. Is ib3 > configured? > comp040.local:5832: dapls_ib_open_hca failed 120000 > comp040.local:5829: open_hca: getaddr_netdev ERROR: No such device. Is ib2 > configured? > comp040.local:5829: dapls_ib_open_hca failed 120000 > comp040.local:5826: open_hca: getaddr_netdev ERROR: No such file or > directory. Is ib1 configured? > comp040.local:5826: dapls_ib_open_hca failed 120000 > comp040.local:5827: open_hca: getaddr_netdev ERROR: No such device. Is ib3 > configured? > comp040.local:5827: dapls_ib_open_hca failed 120000 > comp040.local:5828: open_hca: getaddr_netdev ERROR: No such file or > directory. Is ib1 configured? > comp040.local:5828: comp040.local:5832: open_hca: getaddr_netdev ERROR: No > such device. Is bond0 configured? > comp040.local:5832: dapls_ib_open_hca failed 120000 > dapls_ib_open_hca failed 120000 > comp040.local:5829: open_hca: getaddr_netdev ERROR: No such device. Is ib3 > configured? > comp040.local:5829: dapls_ib_open_hca failed 120000 > comp040.local:5826: open_hca: getaddr_netdev ERROR: No such device. Is ib2 > configured? > comp040.local:5826: dapls_ib_open_hca failed 120000 > comp040.local:5833: open_hca: getaddr_netdev ERROR: No such file or > directory. Is ib1 configured? > comp040.local:5833: dapls_ib_open_hca failed 120000 > comp040.local:5831: open_hca: getaddr_netdev ERROR: No such file or > directory. Is ib1 configured? > comp040.local:5831: dapls_ib_open_hca failed 120000 > comp040.local:5827: open_hca: getaddr_netdev ERROR: No such device. Is bond0 > configured? > comp040.local:5827: dapls_ib_open_hca failed 120000 > comp040.local:5830: open_hca: getaddr_netdev ERROR: No such file or > directory. Is ib1 configured? > comp040.local:5830: dapls_ib_open_hca failed 120000 > comp040.local:5833: open_hca: getaddr_netdev ERROR: No such device. Is ib2 > configured? > comp040.local:5833: dapls_ib_open_hca failed 120000 > comp040.local:5829: open_hca: getaddr_netdev ERROR: No such device. Is bond0 > configured? > comp040.local:5829: dapls_ib_open_hca failed 120000 > comp040.local:5826: open_hca: getaddr_netdev ERROR: No such device. Is ib3 > configured? > comp040.local:5826: dapls_ib_open_hca failed 120000 > comp040.local:5828: open_hca: getaddr_netdev ERROR: No such device. Is ib2 > configured? > comp040.local:5828: dapls_ib_open_hca failed 120000 > comp040.local:5831: open_hca: getaddr_netdev ERROR: No such device. Is ib2 > configured? > comp040.local:5831: dapls_ib_open_hca failed 120000 > comp040.local:5833: open_hca: getaddr_netdev ERROR: No such device. Is ib3 > configured? > comp040.local:5833: dapls_ib_open_hca failed 120000 > comp040.local:5830: open_hca: getaddr_netdev ERROR: No such device. Is ib2 > configured? > comp040.local:5830: dapls_ib_open_hca failed 120000 > comp040.local:5826: open_hca: getaddr_netdev ERROR: No such device. Is bond0 > configured? > comp040.local:5826: dapls_ib_open_hca failed 120000 > comp040.local:5828: open_hca: getaddr_netdev ERROR: No such device. Is ib3 > configured? > comp040.local:5828: dapls_ib_open_hca failed 120000 > comp040.local:5831: open_hca: getaddr_netdev ERROR: No such device. Is ib3 > configured? > comp040.local:5831: dapls_ib_open_hca failed 120000 > comp040.local:5833: open_hca: getaddr_netdev ERROR: No such device. Is bond0 > configured? > comp040.local:5833: dapls_ib_open_hca failed 120000 > comp040.local:5828: open_hca: getaddr_netdev ERROR: No such device. Is bond0 > configured? > comp040.local:5828: dapls_ib_open_hca failed 120000 > comp040.local:5831: open_hca: getaddr_netdev ERROR: No such device. Is bond0 > configured? > comp040.local:5831: dapls_ib_open_hca failed 120000 > comp040.local:5830: open_hca: getaddr_netdev ERROR: No such device. Is ib3 > configured? > comp040.local:5830: dapls_ib_open_hca failed 120000 > comp040.local:5830: open_hca: getaddr_netdev ERROR: No such device. Is bond0 > configured? > comp040.local:5830: dapls_ib_open_hca failed 120000 > Hello world: rank 8 of 16 running on comp047.local > Hello world: rank 13 of 16 running on comp047.local > Hello world: rank 10 of 16 running on comp047.local > Hello world: rank 11 of 16 running on comp047.local > Hello world: rank 15 of 16 running on comp047.local > Hello world: rank 9 of 16 running on comp047.local > Hello world: rank 12 of 16 running on comp047.local > Hello world: rank 7 of 16 running on comp040.local > Hello world: rank 14 of 16 running on comp047.local > Hello world: rank 6 of 16 running on comp040.local > Hello world: rank 0 of 16 running on comp040.local > Hello world: rank 4 of 16 running on comp040.local > Hello world: rank 2 of 16 running on comp040.local > Hello world: rank 5 of 16 running on comp040.local > Hello world: rank 1 of 16 running on comp040.local > Hello world: rank 3 of 16 running on comp040.local > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
