Hello,

I did an upgrade of Slurm this week (20.11 to 21.08.8) and while everything seems to be working with srun and sbatch commands, here is what I get when I try to launch jobs from drmaa library:


python: /usr/local/lib/slurm/auth_munge.so: Incompatible Slurm plugin version (21.08.8) python: error: Couldn't load specified plugin name for auth/munge: Incompatible plugin version
python: error: cannot create auth context for auth/munge
python: error: slurm_send_node_msg: g_slurm_auth_create: REQUEST_SUBMIT_BATCH_JOB has authentication error: No such device or address
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/drmaa/session.py", line 340, in runBulkJobs
    return list(run_bulk_job(jobTemplate, beginIndex, endIndex, step))
  File "/usr/lib/python2.7/site-packages/drmaa/helpers.py", line 286, in run_bulk_job
    c(drmaa_run_bulk_jobs, jids, jt, start, end, incr)
  File "/usr/lib/python2.7/site-packages/drmaa/helpers.py", line 302, in c
    return f(*(args + (error_buffer, sizeof(error_buffer))))
  File "/usr/lib/python2.7/site-packages/drmaa/errors.py", line 151, in error_check
    raise _ERRORS[code - 1](error_string)
drmaa.errors.InternalException: code 1: slurm_submit_batch_job error (1007): Protocol authentication error


We are running CentOS7 and the following munge development libs are installed on all the nodes:

munge-devel-0.5.11-3.el7.x86_64
munge-0.5.11-3.el7.x86_64
munge-libs-0.5.11-3.el7.x86_64


Here is the commands I used to compile slurm so I think the munge plugin was correctly built:

./configure --sysconfdir=/etc/slurm --enable-pam
make -j $(nproc)
make install
ldconfig


I don't know if this is a slurm or a drmaa bug. So any advice would be welcome.


Best.

--
Julien Rey

Plate-forme RPBS
Unité BFA - CMPLI
Université de Paris
tel: 01 57 27 83 95


Reply via email to