Well, there are numerous ways to do it, but I was trying to do it as much as 
possible from within the slurm infrastructure.

Basically, I want to react when someone submits a job requesting specific 
features that aren't actively available yet, and some of the actions I need to 
take will involve slurm commands.  This seems a bit like the cloud scheduling 
interface, but it's not a cloud service I'm talking about...it's our own 
hardware.

Otherwise, I would think that gathering information to make a decision while in 
the job_submit.lua would be a normal expectation.  Is there really no way to 
know how many nodes are up or what features are on the system while I'm 
processing in the job submit?  sacctmgr seems to work fine in there.

Rob

________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Thomas 
M. Payerle <paye...@umd.edu>
Sent: Tuesday, October 11, 2022 5:31 PM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Can sinfo/scontrol be called from job_submit.lua?

Running scontrol/sinfo from within a job_submit.lua script seems to be  opening 
a big can of worms --- it might be doable, but it would scare me.  Since it 
sounds like you are only doing such for a fairly limited amount of information 
which presumably does not change frequently, perhaps it would be better to have 
a cron job periodically output the desired information to a file, and have the 
job_submit.lua read the information from the file?

On Tue, Oct 11, 2022 at 5:17 PM Groner, Rob 
<rug...@psu.edu<mailto:rug...@psu.edu>> wrote:
I am testing a method where, when a job gets submitted asking for specific 
features, then, if those features don't exist, I'll do something.

The job_submit.lua plugin has worked to determine when a job is submitted 
asking for the specific features.  I'm at the point of checking if those 
features exist already (the features are part of a nodeset and part of a 
partition....so jobs submitted asking for those features will just go to 
pending if no nodes exist that offer those features).  I thought to use "sinfo" 
to get a list of existing features on the system...but it fails to run.  The 
same for trying to use scontrol.

When I submit a job that requests the features, and so the sinfo command runs, 
it all hangs for about 10 seconds and then says:

[me@testsch (RC) slurm] sbatch ./gctest_account_test.sh
sbatch: error: Batch job submission failed: Socket timed out on send/recv 
operation

In the slurmctld.log, I see:
[2022-10-10T17:12:13.933] error: slurm_msg_sendto: 
address:port=10.6.88.99:40100<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2F10.6.88.99%3A40100%2F&data=05%7C01%7Crug262%40psu.edu%7C9e64da18790c4f1cbdf408daabd00f06%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C638011207348366909%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XSj6pugGmHf5Hcq5eBFRuxEoJwJn6Sx3gFzm6ooYaA4%3D&reserved=0>
 msg_type=4004: Unexpected missing socket error


I'll note that "sinfo -V" works...but I suspect it's because it's not trying to 
communicate outside of itself with the slurmctld.

Any suggestions on what to try?  Or is there a better slurm-ic way to do what 
I'm trying to do?

Rob




--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        paye...@umd.edu<mailto:paye...@umd.edu>
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831

Reply via email to