Nicolo, I cannot say what your problem is. However in the past with problems like this I would
a) look at ps -eaf --forest Try to see what the parent processes of these job processes are Clearly if the parent PID is 1 then --forest is nto much help. But the --forest option is my 'goto' option b) look closely at the slurm logs. Do not fool yourself - force yourself to read the logs line by line, around the timestamp when the jobs ends. Being a bit more helpful, in my last job we had endless problems with Matlab jobs leaving orphaned processes. To be fair to Matlab, they have a utility which 'properly' starts parallel jobs under the control of the batch system (OK, it was PBSpro) But users can easily start a job and 'fire off' processes in MAtlab which are nut under the directo control of the batch daemon, leaving orphaned processes when the jobs ends. Actually, if you think about this this is how a batch system works. The batch system daemon starts running processes on your behalf. When the job is killed, all the daughter proccesses of that daemon should die. It is intructive to run ps -eaf --forest sometimes on a compute node during a normal job run. Get to know how things are being created, and what their parents are (two dashes in front of the forest argument) Now think of users who start a batch job and get a list of compute hosts. they MAY use a mechanism such as ssd or indeed pbsdsh to start running job rocesses on those nodes. You will then have trouble with orphaned processes when the job ends. Techniques for dealing with this: a use the PAM module which stops ssh login (actually - this probably allows ssh login suring a job time when th euser has a node allocated) b my favourite - CPU sets - actuallt this wont stop ssh logins either. c Shouting, much shouting. Screaming. Regarding users behavng like this, I have seen several cases of behaviour like this for understandable reasons. On a ssytem which I did not manage, but was asked fro advice, the vendor had provided a sample script for running Ansys. The user wanted to run Abaqus on the compute nodes (or some such - a different application anyway) So he started an empty Ansys job, which sat doing nothing. Then took the list of hosts provided by the batch system and fired up an interactive Abaqus session on his terminal. I honestly hesitate to label this behaviour 'wrong' I als have seen similar when running a CFD job. On 23 April 2018 at 11:50, Nicolò Parmiggiani <nicolo.parmiggi...@gmail.com> wrote: > Hi, > > I have a job that keeps running even though the internal process is > finished. > > What could be the problem? > > Thank you. >