Excuse me, I am trying to run some software on a cluster which uses the
SLURM grid engine. IT support at my institution have exhausted their
knowledge of SLURM in trying to debug this rather nasty bug with a specific
feature of the grid engine and suggested I try here for tips.
I am using jobs of
I knew we weren’t alone! Thanks, Juergen!
If the scheduling engine was slightly better for reservations (eg. “Third
Tuesday” type stuff), it would probably happen a little less often. I know it’s
sort of getting there.
--
#BlackLivesMatter
|| \\UTGERS, |---*O*--
* Ryan Novosielski [210416 21:33]:
> Does anyone have a particularly clever way, either built-in or
> scripted, to find out which jobs will still be running at
> such-and-such time?
Hi Ryan,
coincidentally, I just did this today. For exactly the same reason.
squeue does have a "%L" format opti
I can't speak to what happens on node failure, but I can at least get you a
greatly simplified pair of scripts that will run only one copy on each node
allocated:
#!/bin/bash
# notarray.sh
#SBATCH --nodes=28
#SBATCH --ntasks-per-node=1
#SBATCH --no-kill
echo "notarray.sh is running on $(hostnam
Hi there,
Does anyone have a particularly clever way, either built-in or scripted, to
find out which jobs will still be running at such-and-such time? I bet anyone
who’s made the mistake of not entering a maintenance reservation soon enough
knows the feeling.
I know that jobs /may/ end earlier
* Matthias Leopold [210416 19:35]:
> can someone please explain to me why it's possible to set Grp* resource
> limits on user associations? What's the use for this?
Hi Matthias,
this probably does not fully answer your question, but Grp* limits on
user associations provide the ability to impos
Hi,
can someone please explain to me why it's possible to set Grp* resource
limits on user associations? What's the use for this? As far as I
understood documentation accounts can have children, but not users.
I'm still a newbie exploring Slurm in a test environment, please excuse
maybe stup
I need to migrate several sets of user home directories from an old NFS
file server to a new NFS file server. Each group of users belong to
specific Slurm accounts organized in a hierarchical tree.
I want to make the migration while the cluster is in full production mode
for all the other acc
Hi Niels Carl,
On 16-04-2021 14:41, Niels Carl Hansen wrote:
For each account do
sacctmgr modify account name= set GrpJobs=0
After sync'ing, resume with
sacctmgr modify account name= set GrpJobs=-1
Yes, but this would block all jobs from immediately. If
this account had a week-
Had to do home directory migrations a couple of times without 'full'
downtimes. Similar process, only I don't think we ever bothered
disabling users in LDAP or blocking their jobs. Generally, we told them
we'd move their directory at time X and would they please log out
everywhere; at time X, w
Hi Ole,
On 16/04/2021 14:23, Ole Holm Nielsen wrote:
> Question: Does anyone have experiences with this type of scenario? Any
> good ideas or suggestions for other methods for data migration?
We once did something like that.
Basically it did something like that:
- Process is kicked off per use
Hi Cristóbal
Under Debian Stretch/Buster I had to set
LDFLAGS=-L/usr/lib/x86_64-linux-gnu/nvidia/current for configure to find
the NVML shared library.
Best,
Stephan
On 15.04.21 19:46, Cristóbal Navarro wrote:
Hi Michael,
Thanks, Indeed I don't have it. Slurm must have not detected it.
I do
Hi Jürgen,
On 4/13/21 6:29 PM, Juergen Salk wrote:
* Heckes, Frank [210413 12:04]:
This result from a mgmt. - question. How long jobs have to wait (in s, min, h,
day) before they getting executed and
how many jobs are waiting (are queued) for each partition in a certain time
interval.
The f
13 matches
Mail list logo