from:"Tina Friedrich"

Re: [slurm-users] Finding submitted job script

2018-07-10 Thread Tina Friedrich

of a running based on its jobid? > > > Regards, > Mahmood -- Tina Friedrich, Snr HPC Systems Administrator, Advanced Research Computing Research Computing and Support Services, Academic IT IT Services, University of Oxford http://www.arc.ox.ac.uk

Re: [slurm-users] x11 forwarding not available?

2018-10-16 Thread Tina Friedrich

just ssh'ing to a node and running xterm/etc. > >> > >> With srun, however: > >> > >> srun -n1 --pty --x11 xterm > >> srun: error: Unable to allocate resources: X11 forwarding not available > >> > >> So, what am I missing? > >> > >> Thanks. > >> > >> PS > >>

Re: [slurm-users] x11 forwarding not available?

2018-10-16 Thread Tina Friedrich

before). > Regular ssh forwarding works fine. > > On Tue, Oct 16, 2018 at 09:47:21AM +0100, Tina Friedrich wrote: > > I had an issue getting x11 forwarding via SLURM (srun/sbatch) to work; ssh > > worked fine. Tracked it down to the host name setting on the nodes; as per >

[slurm-users] constraints question

2018-11-06 Thread Tina Friedrich

;knl_generic' plugin enabled; there were some KNL nodes, although they are no longer there. Still, I'm not even requesting 'knl' here?) Google didn't really yield anything, so I thought asking might be quicker. Thanks! Tina -- Tina Friedrich, Snr HPC Systems Administ

Re: [slurm-users] About x11 support

2018-11-19 Thread Tina Friedrich

Hello, two things; you don't actually seem to have the '--x11' flag on your srun command? I.e. does 'srun --x11 --nodelist=compute-0-5 -n 1 -c 6 --mem=8G -A y8 -p RUBY xclock' get you any further? I had some trouble getting the inbuild X forwarding to work, which had to do with hostnames & xau

Re: [slurm-users] About x11 support

2018-11-21 Thread Tina Friedrich

I agree with you on that one - I'd forgotten about that detail. The having to actually do an 'ssh -X' before you can do 'srun --x11' is quite silly, and a bit aggravating. You can do 'ssh -X localhost' and then try the srun; that should work, as well. Tina On 21/11/2018 18:04, Mahmood Naderan

Re: [slurm-users] About x11 support

2018-11-22 Thread Tina Friedrich

I really don't want to start a flaming discussion on this - but I don't think it's an unusual situation. I have, in likewise roughtly 15 years of doing this, not ever worked anywhere where people didn't have a GUI to submit from. It's always been a case of 'Wand to use the cluster? We'll make y

Re: [slurm-users] About x11 support

2018-11-27 Thread Tina Friedrich

tly integrated with our environment for our staff to >> submit and monitor their jobs from (they don't have to touch a single >> job script). >> >> On Thu, Nov 22, 2018 at 6:28 PM Tina Friedrich >> mailto:tina.friedr...@it.ox.ac.uk>> wrote: >> >&g

Re: [slurm-users] GRES GPU issues

2018-12-05 Thread Tina Friedrich

I'm running 18.08.3, and I have a fair number of GPU GRES resources - recently upgraded to 18.08.03 from a 17.x release. It's definitely not as if they don't work in an 18.x release. (I do not distribute the same gres.conf file everywhere though, never tried that.) Just a really stupid question

Re: [slurm-users] GRES GPU issues

2018-12-05 Thread Tina Friedrich

4,06-09,11-14,16-19,21-22] Name=gpu Type=k20 > File=/dev/nvidia[0-1] Cores=0,1 > > What am I missing? > > Thanks... > > > > > On Wed, Dec 5, 2018 at 4:59 AM Tina Friedrich > mailto:tina.friedr...@it.ox.ac.uk>> wrote: > > I'm running

Re: [slurm-users] GRES GPU issues

2018-12-06 Thread Tina Friedrich

that necessary or is > that just a sanity check? > > Once again, I like to thank all contributors to this thread... It has > helped me get my cluster going! > > Thanks. > Lou > > > > On Wed, Dec 5, 2018 at 9:41 AM Tina Friedrich > mailto:tina.friedr...@i

Re: [slurm-users] Issue with x11

2019-05-15 Thread Tina Friedrich

Indeed - am I the only person that finds that quite a bit annoying? A lot of interactive software works a lot better over things like NX, so why this limitation? Tina (I realise I'm not adding much the discussion, probably :) ) On 15/05/2019 08:36, Marcus Wagner wrote: > Dear Mahmood, > > ple

Re: [slurm-users] Issue with x11

2019-05-15 Thread Tina Friedrich

Hadn't yet read that far - I plan to test 19.05 soon anyway. Will report. (I thought the plumbing was - basically - libssh; and, well, ssh itself is capable of dealing with local displays?) Tina On 15/05/2019 15:06, Chris Samuel wrote: > On 15/5/19 3:01 am, Tina Friedrich wrote: >

Re: [slurm-users] User submitted advance reservations? (SGE qrsub equivalent)

2019-05-16 Thread Tina Friedrich

Hi Lawrence, no, as far as I can tell, SLURM doesn't have any way to allow users to submit/create advance reservations. Could you get around it with sudo? It would be easy to allow a group of user to run 'sudo scontrol create ' (or a suitable wrapper script, to make the syntax easy). It'd

Re: [slurm-users] SLURM in Virtual Machine

2019-09-12 Thread Tina Friedrich

Hi Jose, I run my slumrctld (and the database) in a VM. Some of my test/development nodes are VMs, as well. Actual worker nodes are hardware, for performance reasons :) Is it the SLURM controller that you're planning to run as a VM, or the whole cluster? Tina On 12/09/2019 15:23, Jose A wrot

Re: [slurm-users] Monitoring with Telegraf

2019-09-26 Thread Tina Friedrich

I second that question - I'm using the same combination :) I know there's some efforts - see https://slurm.schedmd.com/SLUG16/monitoring_influxdb_slug.pdf - but I don't know exactly what the state of that is at the moment. (I resorted to telegraf's 'execute script' plugin to pump some informat

[slurm-users] tie a reservation to a QoS?

2019-10-28 Thread Tina Friedrich

Hello, is there a possibility to tie a reservation to a QoS (instead of an account or user), or enforce a QoS for jobs submitted into a reservation? The problem I'm trying to solve is - some of our resources are bought on a co-investment basis. As part of that, the 'owning' group can get very

Re: [slurm-users] tie a reservation to a QoS?

2019-10-28 Thread Tina Friedrich

nking has made dedicated partitions and QOSes > something we have not had to deal with as CPU time per 30 day sliding > window has been accepted, can be quantitatively shown, and just is a > much easier way to schedule when ALL resources can be used. > > Bill > > On 10/28/19

Re: [slurm-users] tie a reservation to a QoS?

2019-10-29 Thread Tina Friedrich

ave to solve it with some scripting, then. Tina On 28/10/2019 19:02, Kurt H Maier wrote: > On Mon, Oct 28, 2019 at 06:40:48PM +, Tina Friedrich wrote: >> That's fine and all sounds nice but doesn't precisely help me solve my >> problem - which is how to ensure that peo

Re: [slurm-users] SLURM_TMPDIR

2019-12-11 Thread Tina Friedrich

Hi Angelines, I use a plugin for that - I believe this one https://github.com/hpc2n/spank-private-tmp which sort of does it all; your job sees an (empty) /tmp/. (It doesn't do cleanup, I simply rely on OS cleaning up /tmp, at the moment.) Tina On 05/12/2019 15:57, Angelines wrote: > Hello, >

Re: [slurm-users] RHEL8 support - Missing Symbols in SelectType libraries

2020-02-21 Thread Tina Friedrich

Hello, shame this seems to be the last message in this thread! I'm currently banging against the same problem on a test system. Did anyone get that to run? If yes, how exactly did you build the packages? Tina On 01/11/2019 18:19, Michael Jennings wrote: > On Friday, 01 November 2019, at 10:41:

Re: [slurm-users] RHEL8 support - Missing Symbols in SelectType libraries

2020-02-21 Thread Tina Friedrich

_hardened_cflags “-Wl,-z,lazy” > %global _hardened_ldflags “-Wl,-z,lazy” > > > > -James > > -Original Message- > From: slurm-users On Behalf Of Tina > Friedrich > Sent: Friday, February 21, 2020 10:40 AM > To: slurm-users@lists.schedmd.com > Subject: Re:

Re: [slurm-users] Slurm 19.05 X11-forwarding

2020-02-25 Thread Tina Friedrich

I remember having issues when I set up X forwarding that had to do with how the host names were set on the nodes. I had them set (CentOS default) to the fully qualified hostname, and that didn't work - with an error message very similar to what you're getting, if memory serves right. 'Fixed' it

Re: [slurm-users] SLURM Install

2020-07-14 Thread Tina Friedrich

Hi Bas, I wish I'd known that two years ago, might've saved me some setting up (if it was around two years ago). My SLURM configuration is also CFEngine3 controlled. So I'm quite interested in sharing . Having a look at it in a minute... Tina On 13/07/2020 22:15, Bas van der Vlies wrote:

Re: [slurm-users] Fwd: Slurm MySQL database configuration

2020-07-24 Thread Tina Friedrich

Hi Peter, is this an actual NFS server, or something exporting NFS (like a NetApp). This might be a silly question but - if it's an actual server, could you run the slurmdb server on the NFS server? There would then be no need for any clustered DB service or anything; it would simply make the

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-07 Thread Tina Friedrich

Hello, This is something I've seen once on our systems & it took me a while to figure out what was going on. The solution was that the system topology was such that all GPUs were connected to one CPU. There were no free cores on that particular CPU; so SLURM did not schedule any more jobs to

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-07 Thread Tina Friedrich

G" MaxCPUsPerNode=48 I have played tried variations for gres.conf such as: NodeName=c0005 Name=gpu File=/dev/nvidia[0-1] CPUs=0,2 NodeName=c0005 Name=gpu File=/dev/nvidia[2-3] CPUs=1,3 as well as trying CORES= (rather than CPUSs) with NO success. I’ve battled this all week. Any suggestions

Re: [slurm-users] Only 2 jobs will start per GPU node despite 4 GPU's being present

2020-08-10 Thread Tina Friedrich

ev/nvidia3 CPUs=[1,3,5,7,11,13,15,17,29] I also tried your suggetions of 0-13, 14-27, and a combo. I still only get 2 jobs to run on gpus at a time. If I take off the “CPUs=“, I do get 4 jobs running per node. Jodie On Aug 7, 2020, at 12:18 PM, Tina Friedrich wrote: Hi Jodie, what version of SLURM are you us

Re: [slurm-users] Adding Users to Slurm's Database

2020-08-18 Thread Tina Friedrich

Script. Not doing manual anything if it can at all be avoided, way to error prone. We have a cron job that does all of that. Checks if there are users or groups in LDAP that aren't in SLURM yet, and adds them - that's adding accounts, adding users, I think it also removed users/accounts i

[slurm-users] accounting database question

2020-11-16 Thread Tina Friedrich

Hi List, apologies if this has been asked before (or is obvious) - I did do some reading & searching but can't quite figure the best way to achieve this. Background - we have two productions clusters, both running SLURM. They are not currently a multi-cluster setup; they are not running the s

Re: [slurm-users] Defining an empty partition

2020-12-18 Thread Tina Friedrich

Yeah, I had that problem as well (trying to set up a partition that didn't have any nodes - they're not here yet). I figured that one can have partitions with nodes that don't exist, though. As in, not even in DNS. I currently have this: [arc-slurm ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES

Re: [slurm-users] [EXTERNAL] Possible to copy sacctmgr info from one cluster to another?

2021-01-13 Thread Tina Friedrich

jul...@sdstate.edu> -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

[slurm-users] problems using all cores (MPI) / cgroups / tasks problem

2021-01-14 Thread Tina Friedrich

hat crossed my mind is I might have missed a compile option or compile dependency (but I'm not sure which it would be if it were that - it's not as if the binding doesn't work at all.) In short - am a bit stumped; any help welcome! Tina -- Tina Friedrich, Advanced Rese

Re: [slurm-users] Exclude Slurm packages from the EPEL yum repository

2021-01-25 Thread Tina Friedrich

.org/pub/epel/7/$basearch> metalink=https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch&infra=$infra&content=$contentdir <https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch&infra=$infra&content=$contentdir> failovermethod=prio

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-27 Thread Tina Friedrich

use to build our vast array of RPMs and they work just fine on our GPU nodes. All the best, Chris -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] cant start slurmd

2021-02-09 Thread Tina Friedrich

: cannot find auth plugin for auth/munge slurmd: error: cannot create auth context for auth/munge slurmd: error: slurmd initialization failed command terminated with exit code 1 Any advice? thank you -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Compu

Re: [slurm-users] Slurm version 20.11.5 is now available

2021-03-19 Thread Tina Friedrich

com/job_container.conf.html Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] Slurm reservation for migrating user home directories

2021-04-16 Thread Tina Friedrich

and wipe the sssd cache for the user. - Kill all their processes on the login nodes - Move the data - Re-enable the user in the LDAP - Remove any blocks/limits of the user to start new job - Mail the user that he/she can continue working again. The whole process went pretty smooth. Ward -- Ti

Re: [slurm-users] safe to delete old QOSes?

2021-04-19 Thread Tina Friedrich

se? I'm assuming it wouldn't, but I figured it safe to ask questions first and shoot later. -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] SLURM 20.11.0 no x11 forwarding.

2021-05-04 Thread Tina Friedrich

ty error, then X11 support was definitely compiled into Slurm. The most common cause of .Xauthority issues is the user's home directory hitting their quota limit. Could that be the case here? -- Prentice -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Researc

Re: [slurm-users] Questions about adding new nodes to Slurm

2021-05-04 Thread Tina Friedrich

eady have for PBS, and will create for Slurm, if something doesn't already exist). Thank you all, David -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] [External] Re: safe to delete old QOSes?

2021-05-04 Thread Tina Friedrich

s that used it in the database :) Can you elaborate on what you mean by "renaming"? -- Prentice On 4/19/21 8:55 AM, Tina Friedrich wrote: Hi Prentice, I've just done that on one of my test systems - and it's not deleting a no longer used QoS, but 'renaming' th

Re: [slurm-users] Questions about adding new nodes to Slurm

2021-05-04 Thread Tina Friedrich

en the slurm controller... you are making a huge issue of a very basic task... Sid On Tue, 4 May 2021, 22:28 Tina Friedrich, <mailto:tina.friedr...@it.ox.ac.uk>> wrote: Hello, a lot of people already gave very good answer to how to tackle this. Still, I thought it wort

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-21 Thread Tina Friedrich

ich that person has a running job without any further ado, i.e. without the necessity to set up anything else or to enter any credentials. Is this assumption correct? If so, how can I best debug what I have done wrong? Cheers, Loris -- Tina Friedrich, Advanced Research Computing Snr HPC

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-25 Thread Tina Friedrich

@c005's password: My assumption was that a user should be able to log into a node on which that person has a running job without any further ado, i.e. without the necessity to set up anything else or to enter any credentials. Is this assumption correct? If so, how can I best debug what I ha

Re: [slurm-users] Effect of slurmctld and slurmdb going down on running/pending jobs

2021-06-24 Thread Tina Friedrich

boot for changes to take effect. Do we have to stop users submitting jobs by draining all partitions and then restart the server. That is slurmctld.slurmdb and mariadb? Or will the restarting of slurm vm have no effect on running/pending iobs? Sincerely Amjad -- Tina Friedri

Re: [slurm-users] How to avoid a feature?

2021-07-01 Thread Tina Friedrich

ipts, so having it avoid by default would work. Any ideas how to do that? Submit LUA perhaps? Brian Andrus -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] How to avoid a feature?

2021-07-02 Thread Tina Friedrich

seem to be any benefit to that that we could see. Tina On 02/07/2021 06:48, Loris Bennett wrote: Hi Tina, Tina Friedrich writes: Hi Brian, sometimes it would be nice if SLURM had what Grid Engine calls a 'forced complex' (i.e. a feature that you *have* to request to land on a no

Re: [slurm-users] How to avoid a feature?

2021-07-02 Thread Tina Friedrich

uly 2, 2021 12:48 AM To: Slurm User Community List Subject: Re: [slurm-users] How to avoid a feature? ◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources. Hi Tina, Tina Friedrich writes: Hi Brian, sometimes it

Re: [slurm-users] Slurm version 20.11.8 is now available

2021-07-15 Thread Tina Friedrich

bug fixes. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] Calculate the GPU usages

2021-09-01 Thread Tina Friedrich

With Thanks and regards > > so, without having checked your sacct/awk logic I would not expect the results to be the same. > > Cheers, > > Loris > > -- > Dr. Loris Bennett (Hr./Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] How to enforce memory contrains?

2021-10-05 Thread Tina Friedrich

Thanks for your help slurmd -V slurm 20.02.6 slurm.conf TaskPlugin=task/affinity,task/cgroup ProctrackType=proctrack/cgroup cgroup.conf AllowedRAMSpace=100.0 AllowedSwapSpace=0.0 ConstrainRAMSpace=yes ConstrainSwapSpace=yes MemorySwappiness=0 CgroupAutomount=yes ConstrainCores=yes -- Tin

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread Tina Friedrich

lementation. We wanted to set up 3 clusters and one Login Node to run the job using -M cluster option. can anybody have such a setup and can share some insight into how it works and it is really a stable solution. Regards Navin. -- Tina Friedrich, Advanced Research Computing Snr HPC Sy

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread Tina Friedrich

on the login node in slurm .conf file pointed to which Slurmctld? is it possible to share the sample slurm.conf file of login Node. Regards Navin. On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich mailto:tina.friedr...@it.ox.ac.uk>> wrote: Hi Navin, well, I have two cl

Re: [slurm-users] Slurm Multi-cluster implementation

2021-11-03 Thread Tina Friedrich

H to the correct slurm binaries (which we install in /usr/local/slurm// so that they co-exists). So when the -M won't work, users can use: module load slurm/clusterA squeue module load slurm/clusterB squeue BR, On Thu, Oct 28, 2021 at 7:39 PM na

Re: [slurm-users] Compute nodes cycling from idle to down on a regular basis ?

2022-02-02 Thread Tina Friedrich

is settings is also necessary on the compute nodes ? Best; Jeremy. [1] https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-arp-cache-for-large-networks -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Supp

Re: [slurm-users] Limiting srun to a specific partition

2022-02-15 Thread Tina Friedrich

edmd.com/show_bug.cgi?id=3094 Best, -- Rémi Palancher Rackslab: Open Source Solutions for HPC Operations https://rackslab.io -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] monitoring and update regime for Power Saving nodes

2022-02-24 Thread Tina Friedrich

CA, Adeilad Redwood, King Edward VII Avenue, Caerdydd, CF10 3NB simpso...@cardiff.ac.uk <mailto:simpso...@cardiff.ac.uk> +44 29208 74657 -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Servi

Re: [slurm-users] monitoring and update regime for Power Saving nodes

2022-02-24 Thread Tina Friedrich

n - Senior Systems Engineer ARCCA, Redwood Building, King Edward VII Avenue, Cardiff, CF10 3NB David Simpson - peiriannydd uwch systemau ARCCA, Adeilad Redwood, King Edward VII Avenue, Caerdydd, CF10 3NB simpso...@cardiff.ac.uk <mailto:simpso...@card

Re: [slurm-users] Use all cores when submitting to heterogeneous nodes

2022-03-22 Thread Tina Friedrich

ending on which partition gets selected by Slurm. Can this be done? An option similar to --ntasks=USE_ALL_CORES would be great. Many thanks, Richard -- Richard Ems / aiduit / r@aiduit.com <mailto:r@aiduit.com> -- Tina Friedrich, Advanced Research Computing Snr HPC

Re: [slurm-users] how to locate the problem when slurm failed to restrict gpu usage of user jobs

2022-03-23 Thread Tina Friedrich

allocated gpu card? What is the requirement on nvidia gpu drivers, CUDA toolkit or any other part to help slurm correctly restrict the gpu usage? -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services,

Re: [slurm-users] non-historical scheduling

2022-04-12 Thread Tina Friedrich

hone number or email address appearing above. The writer asserts in respect of this message and attachments all rights for confidentiality, privilege or privacy to the fullest extent permitted by law. -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] Allow specific users to drain nodes

2022-04-27 Thread Tina Friedrich

s seems to be overkill of using only this feature. Is there any other plugin that implements this feature? Best, Stefan -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.

Re: [slurm-users] SLURM: reconfig

2022-05-05 Thread Tina Friedrich

Samuel : http://www.csamuel.org/ <http://www.csamuel.org/> : Berkeley, CA, USA -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] [EXT] Software and Config for Job submission host only

2022-05-12 Thread Tina Friedrich

he things that will enable them to get reports,logs whatever an admin and a user will need. Just not execution of the jobs. Thanks in advance for your help. RC. -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, Unive

Re: [slurm-users] Multiple Program Runs using srun in one Slurm batch Job on one node

2022-06-15 Thread Tina Friedrich

AcctGatherFrequency=30 SlurmctldDebug=error SlurmdDebug=error SlurmctldLogFile=/var/log/slurmctld.log SlurmdLogFile=/var/log/slurmd.log NodeName=node0[1-8] CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=1 State=UNKNOWN PartitionName=short Nodes=node[01-08] Default=NO MaxTim

Re: [slurm-users] Allow SFTP on a specific compute node

2022-07-12 Thread Tina Friedrich

Data Scientist Information Technology The University of Chicago Booth School of Business 5807 S. Woodlawn Chicago,Illinois60637 Phone: +(1) 773-834-4556 -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, Unive

Re: [slurm-users] Rolling reboot with at most N machines down simultaneously?

2022-08-04 Thread Tina Friedrich

jobs which reboot nodes - With a for loop, I could submit a reboot job for each node. But I'm not sure how to limit this so at most N jobs are running simultaneously. With a fake license called reboot? -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator

Re: [slurm-users] Rolling reboot with at most N machines down simultaneously?

2022-08-04 Thread Tina Friedrich

...job dependencies are also an option, thinking about this. You could carve it up into X 'sets' of N nodes, with node-specific reboot jobs that depend on the previous job in the same 'N' to finish. Tina On 04/08/2022 11:23, Tina Friedrich wrote: I'm thinking some

Re: [slurm-users] Changing a user's default account

2022-08-05 Thread Tina Friedrich

ion on the basis of information in this e-mail or any attachments. The DRW Companies make no representations that this e-mail or any attachments are free of computer viruses or other defects. -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computin

Re: [slurm-users] unused job data fields?

2022-10-04 Thread Tina Friedrich

...not sure I'm adding anything to this discussion, but we have used the various comment fields to store extra information for internal purposes - sometimes JSON format strings so it can be parsed by scripts etc. I even once managed to mod the Elasticsearch plugin so that the comment field made

Re: [slurm-users] X11 forwarding, slurm-22.05.3, hostbased auth

2022-10-10 Thread Tina Friedrich

nd trying xclock on the login node would clarify that Sorry, yes running xterm, xclock, etc. on the login node works. Thanks, Allan -- Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator Research Computing and Support Services IT Services, University of Oxford http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Re: [slurm-users] Slurmdbd High Availability

2023-04-14 Thread Tina Friedrich

Or run your database server on something like VMWare ESXi (which is what we do). Instant HA and I don't even need multiple servers for it :) I don't mean to be flippant, and I realise it's not addressing the mysql HA question (but that got answered). However, a lot of us will have some sort of

Re: [slurm-users] Tracking efficiency of all jobs on the cluster (dashboard etc.)

2023-07-26 Thread Tina Friedrich

Hi Will, I don't, currently, although it's on my list. However, we had a presentation on a recent Oxford HPC-SIG meeting from a colleague, who implemented a simple job profiler that saves a lot of job data (including efficiency) & creates plots of the efficiency of the job run (in a nutshell)

Re: [slurm-users] [External] Re: Granular or dynamic control of partitions?

2023-08-07 Thread Tina Friedrich

Hi Mike, I moved from Grid Engine to SLURM a couple of years ago & it took me a while to get my head around this :) Yes - and you could also just edit slurm.conf and restart the controller. That will not affect running jobs. It's - both in my experience and from all I read - absolutely safe

Re: [slurm-users] Nodes stay drained no matter what I do

2023-08-25 Thread Tina Friedrich

Hi Patrick, we certainly use that information to set affinity, yes. Our gres.conf files (node-specific, as our config management creates them locally from 'nvidia-smi topo -m') - look like this: Name=gpu Type=a100 File=/dev/nvidia0 CPUs=0-23 Name=gpu Type=a100 File=/dev/nvidia1 CPUs=0-23 Nam

Re: [slurm-users] Database cluster

2024-01-26 Thread Tina Friedrich

We do the same as Josef - we run the database on a VM (single VM, MariaDB) and leave it up to (in our case) VMWare to ensure its availability. Tina On 25/01/2024 11:34, Josef Dvoracek wrote: To protect from HW failure, and to have more free hands when upgrading underlying OS, we use virtualiza

77 matches

Mail list logo