Hi Ryan,

My apologies for letting this reply languish. Thank you for your reply - we
have a working plugin now.

I believe the issue using the plugin without restarting slurmctld first was
(for some reason I still haven't figured out) causing slurmctld to crash
and I had attributed it to a problem with the plugin itself.

I found that restarting slurmctld was required. Without restarting, even if
I run scontrol reconfigure, I was getting
salloc: error: Job submit/allocate failed: Unexpected message received.
It's consistent - I just tested it again to double check before sending
this reply and the smallest change to the plugin will cause slurmctld to
crash if I don't restart it first. Maybe that was mentioned somewhere in
the job_submit_plugins documentation but if so I missed it and that's
pretty much all that we needed.

Thanks again!

Kind Regards,
Glen

==========================================
Glen MacLachlan, PhD
*Cyberinfrastructure Specialist*

Research Technology Services
The George Washington University
44983 Knoll Square
Enterprise Hall, 328L
Ashburn, VA 20147

==========================================







On Tue, Apr 9, 2024 at 4:47 PM Ryan Cox via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Glen,
>
> I don't think I see it in your message, but are you pointing to the plugin
> in slurm.conf with JobSubmitPlugins=?  I assume you are but it's worth
> checking.
>
> Ryan
>
> On 4/9/24 10:19, Glen MacLachlan via slurm-users wrote:
>
> Hi,
>
> We have a plugin in Lua that mostly does what we want but there are
> features available in the C extension that are not available to lua. For
> that reason, we are attempting to convert to C using the guidance found
> here: https://slurm.schedmd.com/job_submit_plugins.html#building. We
> arrived here because the lua plugins don't seem to stretch enough to cover
> the use case we were looking at, i.e., branching off of the value of
> alloc_id or, for that matter, get_sid().
>
> The goal is to disallow interactive allocations (i.e., salloc) on
> specific partitions while allowing it on others. However, we've run into
> an issue with our C plugin right out of the gate and I've included a
> minimal reproducer as an example which is basically a "Hello World" type of
> test (job_submit_disallow_salloc.c, see attached).
>
> *Expectation*
> What we expect to happen is a sort of hello-world result with a message
> being written to a /tmp/min_repo.log but that does not occur. It seems that
> the plugin does not get run at all when jobs are submitted. Jobs still run
> as expected but the plugin seems to be ignored.
>
> *Steps*
> We compile
> gcc -fPIC -DHAVE_CONFIG_H -I /modules/source/slurm-23.02.4 -g -O2 -pthread
> -fno-gcse -Werror -Wall -g -O0 -fno-strict-aliasing -MT
> job_submit_disallow_salloc.lo -MD -MP -MF .deps/job_submit_disallow_salloc.Tpo
> -c job_submit_disallow_salloc.c -o .libs/job_submit_disallow_salloc.o
>
> mv .deps/job_submit_disallow_salloc.Tpo .deps/job_submit_disallow_
> salloc.Plo
>
> and link
> gcc -shared -fPIC -DPIC .libs/job_submit_disallow_salloc.o -O2 -pthread
> -O0 -pthread -Wl,-soname -Wl,job_submit_disallow_salloc.so    -o
> job_submit_disallow_salloc.so
>
>
>
> Check links after copying to /usr/lib64/slurm:
> ldd /usr/lib64/slurm/job_submit_disallow_salloc.so
> linux-vdso.so.1 (0x00007ffe467aa000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1c02095000)
> libc.so.6 => /lib64/libc.so.6 (0x00007f1c01cd0000)
> /lib64/ld-linux-x86-64.so.2 (0x00007f1c024b7000)
>
>
>
> Can someone point out what we are doing incorrectly or how we might
> troubleshoot this issue?
>
> Kindest regards,
> Glen
>
>
>
> *Reproducer*
> The minimal reproducer is basically a "hello world" for C extensions which
> I've pasted below (I've also attached it for convenience):
>
> #include <slurm/slurm.h>
> #include <slurm/slurm_errno.h>
> #include <stdio.h>
> #include "src/slurmctld/slurmctld.h"
>
> const char plugin_name[] = "Min Reproducer";
> const char plugin_type[] = "job_submit/disallow_salloc";
> const uint32_t plugin_version = SLURM_VERSION_NUMBER;
>
> extern int job_submit(job_desc_msg_t *job_desc, uint32_t submit_uid,
>                       char **err_msg)
> {
>         FILE *fp;
>         fp = fopen("/tmp/min_repo.log", "w");
>         fprintf(fp,"Hello!");
>
>         fclose(fp);
>         return SLURM_SUCCESS;
> }
>
> int job_modify(job_desc_msg_t *job_desc, job_record_t *job_ptr,
>                uint32_t submit_uid, char **err_msg)
> {
>         return SLURM_SUCCESS;
> }
>
>
>
> --
> Ryan Cox
> Director
> Office of Research Computing
> Brigham Young University
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to