On 8/14/20 6:17 am, Stefan Staeglich wrote:
what's the current status of the checkpointing support in SLURM?
There isn't any these days, there used to be support for BLCR but that's
been dropped as BLCR is no more.
I know from talking with SchedMD they are of the opinion that any
current c
We use a scenario that is analogous to yours using features. Features
are defined in slurm.conf and are associated with nodes from-which a
job may be submitted, as an administratively, configuration-managed
authoritative source. (NodeName=xx-login State=FUTURE
AvailableFeatures=) (ie.
={green,blue,
Hi Thomas,
We do not need to lock out jobs from the other nodes. All our jobs specify
constraints and will be scheduled on nodes accordingly.
To follow your example:
* a job with unsatisfiable constraint foo is submitted
* the scanning tool detects the job queued and schedules another j
Probably your best bet would be to to use the job_submit.lua script and
block using that.
-Paul Edmon-
On 8/14/2020 11:05 AM, rapier wrote:
Hi,
I'm relatively new to slurm and I'm trying to deal with something I
don't know how to address. I have a reservation set up that users can
submit jo
Hi,
I'm relatively new to slurm and I'm trying to deal with something I
don't know how to address. I have a reservation set up that users can
submit jobs to. However, I don't want them to be able to submit any job
at all to this reservation. I want to restrict them to only running jobs
that c
Hi,
what's the current status of the checkpointing support in SLURM? There was a
CRIU plugin mentioned:
https://slurm.schedmd.com/SLUG16/ciemat-cr.pdf
But it doesn't exist in SLURM 19.05.5 on Ubuntu 20.04. And the manual page
mentions an OpenMPI plugin only.
Best,
Stefan
--
Stefan Stäglich,
Making the certificate globally-available on the host may not always be
permissible. If I were you, I'd write/suggest a modification to the plugin to
make the CA path (CURLOPT_CAPATH) and verification itself
(CURLOPT_SSL_VERIFYPEER) configurable in Slurm. They are both straightforward
options
Hi,
all except of /etc/ssl/certs/ca-certificates.crt is ignored. So I've copied it
to /usr/local/share/ca-certificates/ and run update-ca-certificates.
Now it's working :)
Best,
Stefan
Am Freitag, 14. August 2020, 11:42:04 CEST schrieb Stefan Staeglich:
> Hi,
>
> I try to setup the acct_gathe
hi max,
> I have set: 'UCX_TLS=tcp,self,sm' on the slurmd's.
> Is it better to build slurm without UCX support or should I simply install
> rdma-core?
i would look into using mellanox ofed with rdma-core, as it is what
mellanox is shifting towards or has already done (not sure what 4.9 has
tbh). o
We’ve run a similar setup since I moved to Slurm 3 years ago, with no issues.
Could you share partition definitions from your slurm.conf?
When you see a bunch of jobs pending, which ones have a reason of “Resources”?
Those should be the next ones to run, and ones with a reason of “Priority” are
Hi,
I try to setup the acct_gather plugin ProfileInfluxDB. Unfortunately our
influxdb server has a self-signed certificate only:
[2020-08-14T09:54:30.007] [46.0] error: acct_gather_profile/influxdb
_send_data: curl_easy_perform failed to send data (discarded). Reason: SSL
peer certificate or SS
Hello all,
we are experiencing an issue in our cluster where sometimes entire nodes
remain idle while jobs are pending in the queue that could run on the
nodes in question.
Our node topology is a bit special where almost all our nodes are in one
common partition a subset of all those nodes a
12 matches
Mail list logo