Hi Ward, Thanks for replying. I tried these but the error is exactly the same (everything under "/shared" has permissions 777 and owned by "nobody:nogroup"):
/etc/slurm/slurm.conf JobContainerType=job_container/tmpfs Prolog=/shared/SlurmScripts/prejob PrologFlags=contain /etc/slurm/job_container.conf # AutoBasePath=true BasePath=/shared/BasePath /shared/SlurmScripts/prejob #!/usr/bin/env bash MY_XDG_RUNTIME_DIR=/shared/SlurmXDG mkdir -p $MY_XDG_RUNTIME_DIR echo "export XDG_RUNTIME_DIR=$MY_XDG_RUNTIME_DIR" On Wed, May 15, 2024 at 2:28 PM Ward Poelmans via slurm-users < slurm-users@lists.schedmd.com> wrote: > Hi, > > This is systemd, not slurm. We've also seen it being created and removed. > As far as I understood something about the session that systemd clean up. > We've worked around by adding this to the prolog: > > MY_XDG_RUNTIME_DIR=/dev/shm/${USER} > mkdir -p $MY_XDG_RUNTIME_DIR > echo "export XDG_RUNTIME_DIR=$MY_XDG_RUNTIME_DIR" > > (in combination with private tmpfs per job). > > Ward > > On 15/05/2024 10:14, Arnuld via slurm-users wrote: > > I am using the latest slurm. It runs fine for scripts. But if I give it > a container then it kills it as soon as I submit the job. Is slurm cleaning > up the $XDG_RUNTIME_DIR before it should? This is the log: > > > > [2024-05-15T08:00:35.143] [90.0] debug2: _generate_patterns: StepId=90.0 > TaskId=-1 > > [2024-05-15T08:00:35.143] [90.0] debug3: _get_container_state: command > argv[0]=/bin/sh > > [2024-05-15T08:00:35.143] [90.0] debug3: _get_container_state: command > argv[1]=-c > > [2024-05-15T08:00:35.143] [90.0] debug3: _get_container_state: command > argv[2]=crun --rootless=true --root=/run/user/1000/ state > slurm2.acog.90.0.-1 > > [2024-05-15T08:00:35.167] [90.0] debug: _get_container_state: > RunTimeQuery rc:256 output:error opening file > `/run/user/1000/slurm2.acog.90.0.-1/status`: No such file or directory > > > > [2024-05-15T08:00:35.167] [90.0] error: _get_container_state: > RunTimeQuery failed rc:256 output:error opening file > `/run/user/1000/slurm2.acog.90.0.-1/status`: No such file or directory > > > > [2024-05-15T08:00:35.167] [90.0] debug: container already dead > > [2024-05-15T08:00:35.167] [90.0] debug3: _generate_spooldir: task:0 > pattern:%m/oci-job%j-%s/task-%t/ path:/var/spool/slurmd/oci-job90-0/task-0/ > > [2024-05-15T08:00:35.167] [90.0] debug2: _generate_patterns: StepId=90.0 > TaskId=0 > > [2024-05-15T08:00:35.168] [90.0] debug3: _generate_spooldir: task:-1 > pattern:%m/oci-job%j-%s/ path:/var/spool/slurmd/oci-job90-0/ > > [2024-05-15T08:00:35.168] [90.0] stepd_cleanup: done with step > (rc[0x100]:Unknown error 256, cleanup_rc[0x0]:No error) > > [2024-05-15T08:00:35.275] debug3: in the service_connection > > [2024-05-15T08:00:35.278] debug2: Start processing RPC: > REQUEST_TERMINATE_JOB > > [2024-05-15T08:00:35.278] debug2: Processing RPC: REQUEST_TERMINATE_JOB > > [2024-05-15T08:00:35.278] debug: _rpc_terminate_job: uid = 64030 > JobId=90 > > [2024-05-15T08:00:35.278] debug: credential for job 90 revoked > > > > > > > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com