Matthew, that deserves an explanation. Bright Computing Proof of Concept causes nightmares? That is a pretty strong assertion. Please give more details.
On Wed, 13 Feb 2019 at 16:01, Matthew BETTINGER < matthew.bettin...@external.total.com> wrote: > One of the main guy Panos left Bright so no answer to your specific > question but I hope you can get some support with it. We dumped our BC > PoC, the sysadmin working on the PoC still has nightmares. > > On 2/13/19, 6:54 AM, "slurm-users on behalf of John Hearns" < > slurm-users-boun...@lists.schedmd.com on behalf of hear...@googlemail.com> > wrote: > > Yugendra, the Bright support guys are excellent. > Slurm is their default choice. I would ask again. Yes, Slurm is > technically out of scope for them, but they shoudl help a bit. > > > By the way, I think your problem is that you have configured > authentication using AD on your head node. > BUT you have not confiured it ont he compute node images. You probably > have to prepare a new compute node image then push that otu to the compute > nodes. > > > > > > > > > > > > > On Wed, 13 Feb 2019 at 12:35, Yugendra Guvvala < > yguvv...@cambridgecomputer.com> wrote: > > > Also reached out to bright computing support and they say slurm is out > of scope for them. > > Thanks, > Yugi > > > On Feb 13, 2019, at 7:27 AM, Antony Cleave <antony.cle...@gmail.com> > wrote: > > > > can you ssh to the compute node that job was trying to run on as as > the AD user in question? > > > I've seen similar issues on AD integrated systems where some nodes > boot from a different image that have not yet been joined to the domain. > > > Antony > > > On Wed, 13 Feb 2019 at 04:58, Yugendra Guvvala < > yguvv...@cambridgecomputer.com> wrote: > > > Hi, > > > We are bringing a new cluster online. We installed SLURM through > Bright Cluster Manager how ever we are running into a issue here. > > > We are able to run jobs as root user and users created using bright > cluster (cmsh commands). How ever we use AD authentication for all our > users and when we try to submit jobs to slurm using AD users we are getting > following error message. > > > > > srun: fatal: Invalid user id: 10952 > srun: fatal: Invalid user id: 10952 > srun: error: cnode001: task 0: Exited with exit code 1 > > > > Attached is the slurm.con file for reference. Please let us know if > you have any insight into this. > > > > > > > Thanks, > Yugi > > > Yugendra Guvvala | HPC Technologist | Cambridge Computer | "Artists > in Data Storage" > Direct: 781-250-3273 | Cell: 806-773-4464 | > yguvv...@cambridgecomputer.com | www.cambridgecomputer.com < > http://www.cambridgecomputer.com> > > > > _______________________________________________________________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > >