from:"navin srivastava"

[slurm-users] slum job sumisison using different UID/GID

2025-04-05 Thread navin srivastava via slurm-users

Hi, Can munge depend upon the UID/GID for the job submission? There is a scenario where the cluster is running with the NIS authentication mechanism. There is a login Node created which is part of LDAP and we installed the slurm to interact with slurm master. I am able to run jobs using root but

[slurm-users] Re: [EXTERNAL] avoid using same GPU by the interactive job

2025-02-12 Thread navin srivastava via slurm-users

es > > > > You can find the documentation here: > > https://slurm.schedmd.com/cgroup.conf.html > > > > If you want to share GPUs you can use CUDA MPS or MIG if your GPU supports > it. > > > > Regards, > > Jesse Chintanadilok > > > > *From:*

[slurm-users] avoid using same GPU by the interactive job

2025-02-12 Thread navin srivastava via slurm-users

hi, facing an issue in my environment where the batch job and the interactive job use the same gpu. Each server has 2 gpu. When 2 batch jobs are running it works fine and use the 2 different gpu's. but if one batch job is running and another job is submitted interactively then it uses the same GP

[slurm-users] maridb version compatibility with Slurm version

2022-08-23 Thread navin srivastava

Hi, I have a question related to the mariadb vs slurm version compatibility. Is there any matrix available? We are running with slurm version 20.02 in our environment on SLES15SP3 and with mariadb 10.5.x . We are upgrading the OS from SLES15SP3 to SP4 and with this we see the mariadb version is 1

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava

ters & login nodes that allow access to both. > That > > do? I don't think a third would make any difference in setup. > > > > They need to share a database. As long as the share a database, the > > clusters have 'knowledge' of each other. > &g

Re: [slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava

ledge' of each other. > > So if you set up one database server (running slurmdbd), and then a > SLURM controller for each cluster (running slurmctld) using that one > central database, the '-M' option should work. > > Tina > > On 28/10/2021 10:54, navin sri

[slurm-users] Slurm Multi-cluster implementation

2021-10-28 Thread navin srivastava

Hi , I am looking for a stepwise guide to setup multi cluster implementation. We wanted to set up 3 clusters and one Login Node to run the job using -M cluster option. can anybody have such a setup and can share some insight into how it works and it is really a stable solution. Regards Navin.

[slurm-users] Sinfo or squeue stuck for some seconds

2021-08-29 Thread navin srivastava

Dear slurm community users, We are using slurm version 20.02.x. We see the below message appearing a lot of times in slurmctld log and found that whenever this message is appearing the sinfo/squeue out gets slow. No timeout as i kept the value 100. Warning: Note very large processing time from

Re: [slurm-users] missing info from sacct

2020-11-18 Thread navin srivastava

e you using federated clusters? If not, check slurm.conf -- do you > > have FirstJobId set? > > > > Andy > > > > On 11/18/2020 8:42 AM, navin srivastava wrote: > >> While running the sacct we found that some jobid are not listing. > >> > >&g

[slurm-users] missing info from sacct

2020-11-18 Thread navin srivastava

While running the sacct we found that some jobid are not listing. 5535566 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED 0:0 5535567 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED 0:0 11016496 jupyter-s+ stdg_defq stdg_acc 1RUNNING 0:0

Re: [slurm-users] Sreport Query

2020-11-17 Thread navin srivastava

is there a way to find the utilization per Node? Regards Navin. On Wed, Nov 18, 2020 at 10:37 AM navin srivastava wrote: > Dear All, > > Good Day! > > i am seeing one strange behaviour in my environment. > > we have 2 clusters in our environment one acting as a datab

[slurm-users] Sreport Query

2020-11-17 Thread navin srivastava

Dear All, Good Day! i am seeing one strange behaviour in my environment. we have 2 clusters in our environment one acting as a database server and have pointed the 2nd cluster to the same database. -- - hpc1 155.250.126.30 6817 8192 1

Re: [slurm-users] Slurm Upgrade

2020-11-04 Thread navin srivastava

by 18.x and 19.x or i can uninstall the slurm 17.11.8 and install 20.2 on all compute nodes. Regards Navin. On Tue, Nov 3, 2020 at 12:31 PM Ole Holm Nielsen wrote: > On 11/2/20 2:25 PM, navin srivastava wrote: > > Currently we are running slurm version 17.11.x and wanted to mov

[slurm-users] Slurm Upgrade

2020-11-02 Thread navin srivastava

Dear All, Currently we are running slurm version 17.11.x and wanted to move to 20.x. We are building the New server with Slurm 20.2 version and planning to upgrade the client nodes from 17.x to 20.x. wanted to check if we can upgrade the Client from 17.x to 20.x directly or we need to go through

[slurm-users] slurm Report

2020-09-24 Thread navin srivastava

Hi team, i have extracted the %utilization report and found that the idle time is at the higher end so wanted to check is there any way we can find the node based utilization? it will help us to figure out what are the nodes are unutilized. REgards navin.

[slurm-users] federation cluster management

2020-09-21 Thread navin srivastava

Deall all, I read the concept of federation clusters in Slurm. is it really helpful to maximize the cluster usage? Actually we have 4 independent clusters with slurm which works with local storage and wanted to build a federation cluster where we can be able to utilize the free available compute

[slurm-users] is there a way to delay the scheduling.

2020-08-28 Thread navin srivastava

Hi Team, facing one issue. several users submitting 2 job in a single batch job which is very short jobs( says 1-2 sec). so while submitting more job slurmctld become unresponsive and started giving message ending job 6e508a88155d9bec40d752c8331d7ae8 to queue. sbatch: error: Batch job submiss

Re: [slurm-users] CPU allocation for the GPU jobs.

2020-07-13 Thread navin srivastava

y > can complete without delaying the estimated start time of higher priority > jobs. > > On Jul 13, 2020, at 4:18 AM, navin srivastava > wrote: > > Hi Team, > > We have separate partitions for the GPU nodes and only CPU nodes . > > scenario: the jobs submitted in our

[slurm-users] CPU allocation for the GPU jobs.

2020-07-13 Thread navin srivastava

Hi Team, We have separate partitions for the GPU nodes and only CPU nodes . scenario: the jobs submitted in our environment is 4CPU+1GPU as well as 4CPU only in nodeGPUsmall and nodeGPUbig. so when all the GPU exhausted and rest other jobs are in queue waiting for the availability of GPU resour

Re: [slurm-users] changes in slurm.

2020-07-10 Thread navin srivastava

> If you run slurmd -C on the compute node, it should tell you what > > slurm thinks the RealMemory number is. > > > > Jeff > > > > ---- > > *From:* slurm-users on behalf > of > >

Re: [slurm-users] changes in slurm.

2020-07-10 Thread navin srivastava

ish, then remove it. > > Brian Andrus > > On 7/8/2020 10:57 PM, navin srivastava wrote: > > Hi Team, > > > > i have 2 small query.because of the lack of testing environment i am > > unable to test the scenario. working on to set up a test environment. > >

[slurm-users] changes in slurm.

2020-07-08 Thread navin srivastava

Hi Team, i have 2 small query.because of the lack of testing environment i am unable to test the scenario. working on to set up a test environment. 1. In my environment i am unable to pass #SBATCH --mem-2GB option. i found the reason is because there is no RealMemory entry in the node definition

Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-30 Thread navin srivastava

Hi Team, I have differentiated the CPU node and GPU nodes into two different queues. Now I have 20 Nodes having CPUS (20 cores)only but no GPU. Another set of nodes having GPU+CPU.some nodes are with 2 GPU and 20 CPU and some are with 8GPU and 48 CPU assigned to GPU queue user facing issues when

Re: [slurm-users] Changing job order

2020-06-18 Thread navin srivastava

Thanks Ole. Regards Navin On Thu, Jun 18, 2020 at 11:56 AM Ole Holm Nielsen < ole.h.niel...@fysik.dtu.dk> wrote: > The scontrol command to set the nice level is on the list here: > https://wiki.fysik.dtu.dk/niflheim/SLURM#useful-commands > > /Ole > > On 6/18/20 8:05 AM

Re: [slurm-users] Changing job order

2020-06-17 Thread navin srivastava

odify the order of execution. > > El mié., 17 jun. 2020 a las 12:31, navin srivastava (< > navin.alt...@gmail.com>) escribió: > >> Hi Team, >> >> Is their a way to change the job order in slurm.similar to sorder in PBS. >> >> I want to swap my job from the other top job. >> >> Regards >> Navin >> >>

[slurm-users] Changing job order

2020-06-17 Thread navin srivastava

Hi Team, Is their a way to change the job order in slurm.similar to sorder in PBS. I want to swap my job from the other top job. Regards Navin

Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-15 Thread navin srivastava

ion would > have to (a) not require a GPU, (b) require a limited number of CPUs per > node, so that you'd have some CPUs available for GPU jobs on the nodes > containing GPUs. > > -- > *From:* slurm-users on behalf of > navin srivastava > *Sent:

[slurm-users] How Nice value is decided by slurm.

2020-06-15 Thread navin srivastava

Hi, One query about how nice value will be decided by the scheduler. our scheduling policy id FIFO + Fair tree. one user submitted 100 of jobs in different dates. what i see is the old jobs are in queue but few latest job went for the execution. when i see the nice value of the latest running jo

Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-13 Thread navin srivastava

Yes we have separate partitions. Some are specific to gpu having 2 nodes with 8 gpu and another partitions are mix of both,nodes with 2 gpu and very few nodes are without any gpu. Regards Navin On Sat, Jun 13, 2020, 21:11 navin srivastava wrote: > Thanks Renfro. > > Yes we have both

Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-13 Thread navin srivastava

d non-GPU jobs? Do you > have nodes without GPUs? > > On Jun 13, 2020, at 12:28 AM, navin srivastava > wrote: > > Hi All, > > In our environment we have GPU. so what i found is if the user having high > priority and his job is in queue and waiting for the GPU resou

[slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-12 Thread navin srivastava

Hi All, In our environment we have GPU. so what i found is if the user having high priority and his job is in queue and waiting for the GPU resources which are almost full and not available. so the other user submitted the job which does not require the GPU resources are in queue even though lots

Re: [slurm-users] unable to start slurmd process.

2020-06-12 Thread navin srivastava

o:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *navin srivastava > *Sent:* Thursday, June 11, 2020 11:31 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] unable to start slurmd process. > > > > i am able to get the output scontrol show node oled3 >

Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava

shown for “NodeAddr=” > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *navin srivastava > *Sent:* Thursday, June 11, 2020 10:40 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] unable to start slurmd process.

Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava

or the like is messed up? > > > > If that’s not the case, I think my next step would be to follow up on > someone else’s suggestion, and scan the slurmctld.log file for the problem > node name. > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.c

Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava

ig | grep -I log” if you’re not > sure where the logs are stored). > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *navin srivastava > *Sent:* Thursday, June 11, 2020 9:01 AM > *To:* Slurm User Community List > *Subject:* Re:

Re: [slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava

; > For example, > > > > # /usr/local/slurm/sbin/slurmd -D > > > > Just it ^C when you’re done, if necessary. Of course, if it doesn’t fail > when you run it this way, it’s time to look elsewhere. > > > > Andy > > > > *From:* slurm-users [mailt

[slurm-users] unable to start slurmd process.

2020-06-11 Thread navin srivastava

Hi Team, when i am trying to start the slurmd process i am getting the below error. 2020-06-11T13:11:58.652711+02:00 oled3 systemd[1]: Starting Slurm node daemon... 2020-06-11T13:13:28.683840+02:00 oled3 systemd[1]: slurmd.service: Start operation timed out. Terminating. 2020-06-11T13:13:28.68447

Re: [slurm-users] Job failure issue in Slurm

2020-06-08 Thread navin srivastava

s this working earlier or is this the first time are you trying ? > Are you using pam module ? if yes, try disabling the pam module and see > if it works. > > Thanks > Sathish > > On Thu, Jun 4, 2020 at 10:47 PM navin srivastava > wrote: > >> Hi Team, >> >>

[slurm-users] Job failure issue in Slurm

2020-06-04 Thread navin srivastava

Hi Team, i am seeing a weird issue in my environment. one of the gaussian job is failing with the slurm within a minute after it go for the execution without writing anything and unable to figure out the reason. The same job works fine without slurm on the same node. slurmctld.log [2020-06-03T19

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava

node job on an available node being used by JOBID. Add > other parameters as required for cpus-per-task, time limits, or whatever > else is needed. If you start the larger jobs first, and let the later jobs > fill in on idle CPUs on those nodes, it should work. > > > On May 6, 2020,

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava

server=flex_host servertype=flexlm type=license > > and submit jobs with a '-L software_name:N’ flag where N is the number of > nodes you want to run on. > > > On May 6, 2020, at 5:33 AM, navin srivastava > wrote: > > > > Thanks Micheal. > > > > Actua

Re: [slurm-users] how to restrict jobs

2020-05-06 Thread navin srivastava

On May 5, 2020, at 8:37 AM, navin srivastava > wrote: > > > > External Email Warning > > This email originated from outside the university. Please use caution > when opening attachments, clicking links, or responding to requests. > > Thanks Michael, > > > &g

Re: [slurm-users] how to restrict jobs

2020-05-05 Thread navin srivastava

run from 1-4 nodes. > > There are also options to query a FlexLM or RLM server for license > management. > > -- > Mike Renfro, PhD / HPC Systems Administrator, Information Technology > Services > 931 372-3601 / Tennessee Tech University > > > On May 5, 2020, at

[slurm-users] how to restrict jobs

2020-05-05 Thread navin srivastava

Hi Team, we have an application whose licenses is limited .it scales upto 4 nodes(~80 cores). so if 4 nodes are full, in 5th node job used to get fail. we want to put a restriction so that the application can't go for the execution beyond the 4 nodes and fail it should be in queue state. i do not

Re: [slurm-users] not allocating jobs even resources are free

2020-05-04 Thread navin srivastava

Thanks Denial for detailed Description Regards Navin On Sun, May 3, 2020, 13:35 Daniel Letai wrote: > > On 29/04/2020 12:00:13, navin srivastava wrote: > > Thanks Daniel. > > All jobs went into run state so unable to provide the details but > definitely will reach out la

Re: [slurm-users] not allocating jobs even resources are free

2020-04-29 Thread navin srivastava

; It would really help if you pasted the results of: > > squeue > > sinfo > > > As well as the exact sbatch line, so we can see how many resources per > node are requested. > > > On 26/04/2020 12:00:06, navin srivastava wrote: > > Thanks Brian, > > As su

Re: [slurm-users] not allocating jobs even resources are free

2020-04-26 Thread navin srivastava

us to get through but reading > through it multiple times opens many doors. > > DefaultTime is listed in there as a Partition option. > If you are scheduling gres/gpu resources, it's quite possible there are > cores available with no corresponding gpus avail. > > -b > >

Re: [slurm-users] not allocating jobs even resources are free

2020-04-24 Thread navin srivastava

re not > specifying a reasonable timelimit to their jobs, this won't help either. > > > -b > > > On 4/24/20 1:52 PM, navin srivastava wrote: > > In addition to the above when i see the sprio of both the jobs it says :- > > for normal queue jobs all jobs showing t

Re: [slurm-users] not allocating jobs even resources are free

2020-04-24 Thread navin srivastava

PRIORITY FAIRSHARE 1291339 GPUsmall 21052 21053 On Fri, Apr 24, 2020 at 11:14 PM navin srivastava wrote: > Hi Team, > > we are facing some issue in our environment. The resources are free but > job is going into the QUEUE state but not running. > > i have attached t

[slurm-users] not allocating jobs even resources are free

2020-04-24 Thread navin srivastava

Hi Team, we are facing some issue in our environment. The resources are free but job is going into the QUEUE state but not running. i have attached the slurm.conf file here. scenario:- There are job only in the 2 partitions: 344 jobs are in PD state in normal partition and the node belongs fro

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-20 Thread navin srivastava

> Erik Ellestad > Wynton Cluster SysAdmin > UCSF > -- > *From:* slurm-users on behalf of > navin srivastava > *Sent:* Wednesday, April 15, 2020 10:37 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] How to request for the alloca

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-15 Thread navin srivastava

es unless the SchedulerParameters > configuration parameter includes the "default_gbytes" option for gigabytes. > Different units can be specified using the suffix [K|M|G|T]. > https://slurm.schedmd.com/sbatch.html > > > > --- > Erik Ellestad > Wynton Cluster

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-14 Thread navin srivastava

ion of local scratch globally via TmpFS. > > And then the amount per host is defined via TmpDisk=xxx. > > Then the request for srun/sbatch via --tmp=X > > > > --- > Erik Ellestad > Wynton Cluster SysAdmin > UCSF > -- > *From:* slurm-user

Re: [slurm-users] How to request for the allocation of scratch .

2020-04-14 Thread navin srivastava

Any suggestion on the above query.need help to understand it. Does TmpFS=/scratch and the request is #SBATCH --tmp=500GB then it will reserve the 500GB from scratch. let me know if my assumption is correct? Regards Navin. On Mon, Apr 13, 2020 at 11:10 AM navin srivastava wrote: > Hi T

[slurm-users] How to request for the allocation of scratch .

2020-04-12 Thread navin srivastava

Hi Team, i wanted to define a mechanism to request the local disk space while submitting the job. we have dedicated /scratch of 1.2 TB file system for the execution of the job on each of the compute nodes other than / and other file system. i have defined in slurm.conf as TmpFS=/scratch and then

Re: [slurm-users] Resources are free but Job is not getting scheduled.

2020-04-04 Thread navin srivastava

PriorityUsageResetPeriod=DAILY PriorityWeightFairshare=50 PriorityFlags=FAIR_TREE could you please also suggest here if the scheduling policy is fairshare then still it will consider the priority over the partition? Regards Navin. On Sat, Apr 4, 2020 at 8:34 PM navin srivastava wrote: > Hi Team, > > I

[slurm-users] Resources are free but Job is not getting scheduled.

2020-04-04 Thread navin srivastava

Hi Team, I am facing one issue in my environment. our slurm version is 17.11.x My question is i have 2 partition: Queue A with node1 and node2 with Priority=1000 shared=yes Queue B with node1 and node2 with priority=100. shared =yes Problem is when job from A partition is running then the j

Re: [slurm-users] not allocating the node for job execution even resources are available.

2020-03-31 Thread navin srivastava

rom a different partition. On Tue, Mar 31, 2020 at 4:34 PM navin srivastava wrote: > Hi , > > have an issue with the resource allocation. > > In the environment have partition like below: > > PartitionName=small_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE > Stat

[slurm-users] not allocating the node for job execution even resources are available.

2020-03-31 Thread navin srivastava

Hi , have an issue with the resource allocation. In the environment have partition like below: PartitionName=small_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE State=UP Shared=YES Priority=8000 PartitionName=large_jobs Nodes=Node[17,20] Default=NO MaxTime=INFINITE State=UP Shared=YES Pri

[slurm-users] log rotation for slurmctld.

2020-03-13 Thread navin srivastava

Hi, i wanted to understand how log rotation of slurmctld works. in my environment i don't have any logrotation for the slurmctld.log and now the log file size reached to 125GB. can i move the log file to some other location and then restart.reload of slurm service will start a new log file.i thi

Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava

the explanation for each are found on the > Resource Limits document. > > /Ole > > On 2/17/20 12:20 PM, navin srivastava wrote: > > Hi ole, > > > > i am submitting 100 of jobs are i see all jobs starting at the same time > > and all job is going into the run s

Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava

t; Why do you think the limit is not working? The MaxJobs limits the number > of running jobs to 3, but you can still submit as many jobs as you like! > > See "man sacctmgr" for definitions of the limits MaxJobs as well as > MaxSubmitJobs. > > /Ole > > On 2/17/

Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava

Hi, Thanks for your script. with this i am able to show the limit what i set. but this limt is not working. MaxJobs =3, current value = 0 Regards Navin. On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen wrote: > On 2/17/20 11:16 AM, navin srivastava wrote: > > i have an issue

[slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava

Hi Team, i have an issue with the slurm job limit. i applied the Maxjobs limit on user using sacctmgr modify user navin1 set maxjobs=3 but still i see this is not getting applied. i am still bale to submit more jobs. Slurm version is 17.11.x Let me know what setting is required to implement th

64 matches

Mail list logo