Re: [slurm-users] Reproducible irreproducible problem (timeout?)

2023-12-20 Thread Laurence Marks
In terms of dependencies, please think about timing. Currently one loop takes ~70 minutes, and say there is a queue time T for any job. If you split the slow part to run serial one loop takes ~190 minutes + 2T. The time for N iterations would be ~ 190N +570*T versus 70N+T. --- Professor Laurence

Re: [slurm-users] Reproducible irreproducible problem (timeout?)

2023-12-20 Thread Laurence Marks
Dependencies is not an appropriate approach. --- Professor Laurence Marks (Laurie) www.numis.northwestern.edu "Research is to see what everybody else has seen, and to think what nobody else has thought" Albert Szent-Györgyi On Wed, Dec 20, 2023, 14:40 Renfro, Michael wrote:

Re: [slurm-users] Reproducible irreproducible problem (timeout?)

2023-12-20 Thread Laurence Marks
r ABC in XYZ" then I may persuade them to look at specifics. They will need the coaching, alas. On Wed, Dec 20, 2023 at 1:25 PM Gerhard Strangar wrote: > Laurence Marks wrote: > > > After some (irreproducible) time, often one of the three slow tasks > hangs. > > A symptom

[slurm-users] Reproducible irreproducible problem (timeout?)

2023-12-20 Thread Laurence Marks
years). I wonder if there are some timeouts or something similar which drop connectivity. I also wonder whether repeated launching of srun subtasks might be doing something beyond what is normally expected. -- Emeritus Professor Laurence Marks (Laurie) Northwestern University Web

Re: [slurm-users] External Authentication Integration with JWKS and RS256 Tokens

2023-10-05 Thread Laurence
t the issue might be? The jwks can be found at the following URL. https://auth.cern.ch/auth/realms/cern/protocol/openid-connect/certs Cheers, Laurence On 27/03/2023 11:07, Laurence Field wrote: Hi Ümit, Thanks for the reply. Yes, it looks like this is the issue. Although from the master b

Re: [slurm-users] External Authentication Integration with JWKS and RS256 Tokens

2023-03-27 Thread Laurence Field
Hi Ümit, Thanks for the reply. Yes, it looks like this is the issue. Although from the master branch it suggests that the claim_field can also be used but this is not in the version we have deployed. Cheers, Laurence On 24.03.23 16:51, Ümit Seren wrote: Looks like you are missing the

Re: [slurm-users] External Authentication Integration with JWKS and RS256 Tokens

2023-03-24 Thread Laurence Field
give me some hints, they would be most welcome. Cheers, Laurence On 24.03.23 10:41, Laurence Field wrote: Hi Ümit, Thanks for your reply. We are using Keycloak and the JWKS does contain this parameter. I will continue to debug but any suggestions would be greatly appreciated. Cheers

Re: [slurm-users] External Authentication Integration with JWKS and RS256 Tokens

2023-03-24 Thread Laurence Field
Hi Ümit, Thanks for your reply. We are using Keycloak and the JWKS does contain this parameter. I will continue to debug but any suggestions would be greatly appreciated. Cheers, Laurence On 23.03.23 11:42, Ümit Seren wrote: If you use AzureAD as your identity provider beware that their

[slurm-users] External Authentication Integration with JWKS and RS256 Tokens

2023-03-23 Thread Laurence
ailed to verify jwt, rc=22// //slurmctld: error: could not find matching kid or decode failed/ Thanks, Laurence

Re: [slurm-users] NVIDIA MIG question

2022-11-15 Thread Laurence
result you observed suggests that MIG is a feature of the driver i.e lspci shows one device but nvidia-smi shows 7 devices. I haven't played around with this myself in slurm but would be interested to know the answers. Laurence On 15/11/2022 17:46, Groner, Rob wrote: We have successfully