Hi Ryan, My apologies for letting this reply languish. Thank you for your reply - we have a working plugin now.
I believe the issue using the plugin without restarting slurmctld first was (for some reason I still haven't figured out) causing slurmctld to crash and I had attributed it to a problem with the plugin itself. I found that restarting slurmctld was required. Without restarting, even if I run scontrol reconfigure, I was getting salloc: error: Job submit/allocate failed: Unexpected message received. It's consistent - I just tested it again to double check before sending this reply and the smallest change to the plugin will cause slurmctld to crash if I don't restart it first. Maybe that was mentioned somewhere in the job_submit_plugins documentation but if so I missed it and that's pretty much all that we needed. Thanks again! Kind Regards, Glen ========================================== Glen MacLachlan, PhD *Cyberinfrastructure Specialist* Research Technology Services The George Washington University 44983 Knoll Square Enterprise Hall, 328L Ashburn, VA 20147 ========================================== On Tue, Apr 9, 2024 at 4:47 PM Ryan Cox via slurm-users < slurm-users@lists.schedmd.com> wrote: > Glen, > > I don't think I see it in your message, but are you pointing to the plugin > in slurm.conf with JobSubmitPlugins=? I assume you are but it's worth > checking. > > Ryan > > On 4/9/24 10:19, Glen MacLachlan via slurm-users wrote: > > Hi, > > We have a plugin in Lua that mostly does what we want but there are > features available in the C extension that are not available to lua. For > that reason, we are attempting to convert to C using the guidance found > here: https://slurm.schedmd.com/job_submit_plugins.html#building. We > arrived here because the lua plugins don't seem to stretch enough to cover > the use case we were looking at, i.e., branching off of the value of > alloc_id or, for that matter, get_sid(). > > The goal is to disallow interactive allocations (i.e., salloc) on > specific partitions while allowing it on others. However, we've run into > an issue with our C plugin right out of the gate and I've included a > minimal reproducer as an example which is basically a "Hello World" type of > test (job_submit_disallow_salloc.c, see attached). > > *Expectation* > What we expect to happen is a sort of hello-world result with a message > being written to a /tmp/min_repo.log but that does not occur. It seems that > the plugin does not get run at all when jobs are submitted. Jobs still run > as expected but the plugin seems to be ignored. > > *Steps* > We compile > gcc -fPIC -DHAVE_CONFIG_H -I /modules/source/slurm-23.02.4 -g -O2 -pthread > -fno-gcse -Werror -Wall -g -O0 -fno-strict-aliasing -MT > job_submit_disallow_salloc.lo -MD -MP -MF .deps/job_submit_disallow_salloc.Tpo > -c job_submit_disallow_salloc.c -o .libs/job_submit_disallow_salloc.o > > mv .deps/job_submit_disallow_salloc.Tpo .deps/job_submit_disallow_ > salloc.Plo > > and link > gcc -shared -fPIC -DPIC .libs/job_submit_disallow_salloc.o -O2 -pthread > -O0 -pthread -Wl,-soname -Wl,job_submit_disallow_salloc.so -o > job_submit_disallow_salloc.so > > > > Check links after copying to /usr/lib64/slurm: > ldd /usr/lib64/slurm/job_submit_disallow_salloc.so > linux-vdso.so.1 (0x00007ffe467aa000) > libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1c02095000) > libc.so.6 => /lib64/libc.so.6 (0x00007f1c01cd0000) > /lib64/ld-linux-x86-64.so.2 (0x00007f1c024b7000) > > > > Can someone point out what we are doing incorrectly or how we might > troubleshoot this issue? > > Kindest regards, > Glen > > > > *Reproducer* > The minimal reproducer is basically a "hello world" for C extensions which > I've pasted below (I've also attached it for convenience): > > #include <slurm/slurm.h> > #include <slurm/slurm_errno.h> > #include <stdio.h> > #include "src/slurmctld/slurmctld.h" > > const char plugin_name[] = "Min Reproducer"; > const char plugin_type[] = "job_submit/disallow_salloc"; > const uint32_t plugin_version = SLURM_VERSION_NUMBER; > > extern int job_submit(job_desc_msg_t *job_desc, uint32_t submit_uid, > char **err_msg) > { > FILE *fp; > fp = fopen("/tmp/min_repo.log", "w"); > fprintf(fp,"Hello!"); > > fclose(fp); > return SLURM_SUCCESS; > } > > int job_modify(job_desc_msg_t *job_desc, job_record_t *job_ptr, > uint32_t submit_uid, char **err_msg) > { > return SLURM_SUCCESS; > } > > > > -- > Ryan Cox > Director > Office of Research Computing > Brigham Young University > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com