Can't offer much about the qsub job. On the first one, what is your limit on the number of file descriptors? Could be your sys admin has it too low.
On Oct 14, 2011, at 12:07 PM, Ashwani Kumar Mishra wrote: > Hello, > When i try to run the following command i receive the following error when i > try to submit this job on the cluster having 40 nodes with each node having 8 > processor & 8 GB RAM: > > Both the command work well, as long as i use only upto 88 processors in the > cluster, but the moment i allocate more than 88 processors it gives me the > below 2 errors: > > I tried to set the ulimit to unlimited & setting mca parameter > opal_set_max_sys_limits to 1 but still the problem wont go. > > > $ mpirun=/opt/psc/ompi/bin/mpirun abyss-pe np=100 name=cattle k=50 n=10 > in=s_1_1_sequence.txt > > /opt/mpi/openmpi/1.3.3/intel/ > bin/mpirun -np 100 ABYSS-P -k50 -q3 --coverage-hist=coverage.hist -s > cattle-bubbles.fa -o cattle-1.fa s_1_1_sequence.txt > [coe:19807] [[62863,0],0] ORTE_ERROR_LOG: The system limit on number of pipes > a process can open was reached in file base/iof_base_setup.c at line 107 > [coe.:19807] [[62863,0],0] ORTE_ERROR_LOG: The system limit on number of > pipes a process can open was reached in file odls_default_module.c at line 203 > [coe.:19807] [[62863,0],0] ORTE_ERROR_LOG: The system limit on number of > network connections a process can open was reached in file oob_tcp.c at line > 447 > -------------------------------------------------------------------------- > Error: system limit exceeded on number of network connections that can be open > > This can be resolved by setting the mca parameter opal_set_max_sys_limits to > 1, > increasing your limit descriptor setting (using limit or ulimit commands), > or asking the system administrator to increase the system limit. > -------------------------------------------------------------------------- > make: *** [cattle-1.fa] Error 1 > > > > > When i submit the same job through qsub, i receive the following error: > $ qsub -cwd -pe orte 100 -o qsub.out -e qsub.err -b y -N abyss `which > mpirun` /home/genome/abyss/bin/ABYSS-P -k 50 s_1_1_sequence.txt -o av > > > [compute-0-19.local][[28273,1] > ,125][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() > to 173.16.255.231 failed: Connection refused (111) > [compute-0-19.local][[28273,1],127][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] > connect() to 173.16.255.231 failed: Connection refused (111) > [compute-0-23.local][[28273,1],135][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] > connect() to 173.16.255.228 failed: Connection refused (111) > [compute-0-23.local][[28273,1],133][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] > connect() to 173.16.255.228 failed: Connection refused (111) > [compute-0-4.local][[28273,1],113][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] > connect() to 173.16.255.231 failed: Connection refused (111) > > > > Best Regards, > Ashwani > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users