Re: [OMPI users] job termination on grid

2013-04-30 Thread Ralph Castain
On Apr 30, 2013, at 1:54 PM, Vladimir Yamshchikov wrote: > This is the question I am trying to answer - how many threads I can use with > blastx on a grid? If I could request resources by_node, use -pernode option > to have one process per node, and then specify the correct number of threads

Re: [OMPI users] job termination on grid

2013-04-30 Thread Vladimir Yamshchikov
This is the question I am trying to answer - how many threads I can use with blastx on a grid? If I could request resources by_node, use -pernode option to have one process per node, and then specify the correct number of threads for each node. But I cannot, resurces (slots) are requested per-core

Re: [OMPI users] job termination on grid

2013-04-30 Thread Ralph Castain
On Apr 30, 2013, at 1:34 PM, Vladimir Yamshchikov wrote: > I asked grid IT and they said they had to kill it as the job was overloading > nodes. They saw loads up to 180 instead of close to 12 on 12-core nodes. They > think that blastx is not an openmpi application, so openMPI is spawning > b

Re: [OMPI users] job termination on grid

2013-04-30 Thread Vladimir Yamshchikov
I asked grid IT and they said they had to kill it as the job was overloading nodes. They saw loads up to 180 instead of close to 12 on 12-core nodes. They think that blastx is not an openmpi application, so openMPI is spawning between 64-96 blastx processes, each of which is then starting up 96 wor

Re: [OMPI users] job termination on grid

2013-04-30 Thread Reuti
Hi, Am 30.04.2013 um 21:26 schrieb Vladimir Yamshchikov: > My recent job started normally but after a few hours of running died with the > following message: > > -- > A daemon (pid 19390) died unexpectedly with status 137

[OMPI users] job termination on grid

2013-04-30 Thread Vladimir Yamshchikov
Hello, My recent job started normally but after a few hours of running died with the following message: -- A daemon (pid 19390) died unexpectedly with status 137 while attempting to launch so we are aborting. There m