Hello all, Two users on my system experience job failures every time they submit a job via sbatch. When I run their exact submission script, or when I create a local system user and launch from there, the jobs run fine. Here is an example of what I see in the slurmd log:
[2020-07-06T15:02:41.284] task_p_slurmd_batch_request: 1421 [2020-07-06T15:02:41.284] task/affinity: job 1421 CPU input mask for node: 0x00000F0000 [2020-07-06T15:02:41.284] task/affinity: job 1421 CPU final HW mask for node: 0x00000F0000 [2020-07-06T15:02:41.295] _run_prolog: prolog with lock for job 1421 ran for 0 seconds [2020-07-06T15:02:41.295] error: [job 1421] prolog failed status=1:0 [2020-07-06T15:02:41.295] Job 1421 already killed, do not launch batch job The prolog file is simply: #!/bin/bash loginctl enable-linger $SLURM_JOB_USER There seems to be some reason why certain users always encounter this, but I can't figure out why. Their accounts are no "different" than anyone else (not in a different group, etc.), so I don't think permissions are an issue. Anyway, the job failure immediately puts the node into a DRAINED/DRAINING state (which is expected). But for now, these users cannot submit any jobs at all. Any insights would be welcomed! Warmest regards, Jason -- *Jason L. Simms, Ph.D., M.P.H.* Manager of Research and High-Performance Computing XSEDE Campus Champion Lafayette College Information Technology Services 710 Sullivan Rd | Easton, PA 18042 Office: 112 Skillman Library p: (610) 330-5632