Reuti, The problem I am facing is a small small part of our production system, and I cannot modify our mpirun submission system. This is why i am looking at solution using only ompi-clean of mpirun command specification.
Thanks, Nicolas 2012/10/24, Reuti <re...@staff.uni-marburg.de>: > Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: > >> Reuti, >> >> Thanks for your comments, >> >> In our case, we are currently running different mpirun commands on >> clusters sharing the same frontend. Basically we use a wrapper to run >> the mpirun command and to run an ompi-clean command to clean up the >> mpi job if required. >> Using ompi-clean like this just kills all other mpi jobs running on >> same frontend. I cannot use queuing system > > Why? Using it on a single machine was only one possible setup. Its purpose > is to distribute jobs to slave hosts. If you have already one frontend as > login-machine it fits perfect: the qmaster (in case of SGE) can run there > and the execd on the nodes. > > -- Reuti > > >> as you have suggested this >> is why I was wondering a option or other solution associated to >> ompi-clean command to avoid this general mpi jobs cleaning. >> >> Cheers >> Nicolas >> >> 2012/10/24, Reuti <re...@staff.uni-marburg.de>: >>> Hi, >>> >>> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: >>> >>>> I am having issue running ompi-clean which clean up (this is normal) >>>> session associated to a user which means it kills all running jobs >>>> assoicated to this session (this is also normal). But I would like to >>>> be >>>> able to clean up session associated to a job (a not user). >>>> >>>> Here is my point: >>>> >>>> I am running two executable : >>>> >>>> % mpirun -np 2 myexec1 >>>> --> run with PID 2399 ... >>>> % mpirun -np 2 myexec2 >>>> --> run with PID 2402 ... >>>> >>>> When I run orte-clean I got this result : >>>> % orte-clean -v >>>> orte-clean: cleaning session dir tree >>>> openmpi-sessions-ndelader@myhost_0 >>>> orte-clean: killing any lingering procs >>>> orte-clean: found potential rogue orterun process >>>> (pid=2399,user=ndelader), sending SIGKILL... >>>> orte-clean: found potential rogue orterun process >>>> (pid=2402,user=ndelader), sending SIGKILL... >>>> >>>> Which means that both jobs have been killed :-( >>>> Basically I would like to perform orte-clean using executable name or >>>> PID >>>> or whatever that identify which job I want to stop an clean. It seems I >>>> would need to create an openmpi session per job. Does it make sense ? >>>> And >>>> I would like to be able to do something like following command and get >>>> following result : >>>> >>>> % orte-clean -v myexec1 >>>> orte-clean: cleaning session dir tree >>>> openmpi-sessions-ndelader@myhost_0 >>>> orte-clean: killing any lingering procs >>>> orte-clean: found potential rogue orterun process >>>> (pid=2399,user=ndelader), sending SIGKILL... >>>> >>>> >>>> Does it make sense ? Is there a way to perform this kind of selection >>>> in >>>> cleaning process ? >>> >>> How many jobs are you starting on how many nodes at one time? This >>> requirement could be a point to start to use a queuing system, where can >>> remove job individually and also serialize your workflow. In fact: we >>> use >>> GridEngine also local on workstations for this purpose. >>> >>> -- Reuti >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >