Dear Christopher,
I tried as you suggested and increased UnkillableStepTimeout from 60 to 120
seconds, but a few hours later three of my nodes were drained with reason
"Kill task failed" again. We're not using cgroups. There is a bugĀ¹ on
SchedMD's tracker describing attempts to understand this err
On 5/16/19 1:04 AM, Alan Orth wrote:
but now we get a handful of nodes drained every day with reason "Kill
task failed". In ten years of using SLURM I've never had so many
problems as I'm having now. :\
We see "kill task failed" issues but as Marcus says that's not related
to X11 support, wh
On 5/16/19 8:53 AM, Mahmood Naderan wrote:
Can I ask what is the expected release date for 19? It seems that rc1
has been released in theMay?
Sometime in May hopefully!
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Can I ask what is the expected release date for 19? It seems that rc1 has
been released in theMay?
Regards,
Mahmood
On Thu, May 16, 2019 at 4:48 PM Marcus Wagner
wrote:
> Hi Alan,
>
> we are also seeing this, but that has nothing to do with X11 support,
> since we compile atm. SLURM without
Hi Alan,
we are also seeing this, but that has nothing to do with X11 support,
since we compile atm. SLURM without X11 support.
We also see sometimes jobs running on, even if e.g. mpi rank one got
killed by oom, rank zero is stuck in mpi_finalize.
SLURM seems to not detect everytimes, if oom ki
Yes I'm also looking forward to SLURM 19.05. We have had lots of issues
with X11 since we upgraded to 18.08 and started using its built-in X11
support. Part of this was resolved by setting
"X11Parameters=local_xauthority" in slurm.conf to reduce locking contention
on the Xauthority file, but now we
On 5/15/19 11:36 AM, Mahmood Naderan wrote:
I really like to know why x11 is not so friendly? For example, slurm
works with MPI. Why not with X11?!
Because MPI support is fundamental, X11 support is nice to have.
I suspect 19.05 will make your life an awful lot easier!
All the best,
Chris
--
>A
>lot of interactive software works a lot better over things like NX, so
>why this limitation?
Agreed... Slurm is a very powerful job manager and I really appreciate its
capabilities. However, I don't know why x11 has been always a pain for
that? spank-x11 was good but that was not a builtin fe
>Can you try ssh'ing with X11 forwarding to rocks7 (i.e. ssh -X user@rocks7)
from a different machine, and then try your srun >--x11 command?
No... This doesn't work either. The error is
X11 forwarding not available.
Please see the picture at https://pasteboard.co/IeQGNOx.png
Regards,
Mahmood
>please open a console in the VNC session, do a ssh -Y rocks7 in the
console (yes, relogin to the console) and try it again.
>SLURM does not want to use local displays, and a VNC session is a "local"
display, as far as it concerns linux and the X11 >subsystem.
>So, you need to relogin or login to a
On 5/15/19 7:32 AM, Tina Friedrich wrote:
Hadn't yet read that far - I plan to test 19.05 soon anyway. Will report.
Cool, Tim has ripped out all the libssh code (which caused me issues at
${JOB-1} because it didn't play nicely with SSH keep alive messages) and
replaced it with native handling
hi all,
we are currently also going through the painful process of making x11
support userfriendly, so i'm also in favour of making this work from eg
vnc or nx/x2go.
however, we now run 17.11.8, and we already noticed that 17.11.11 has
very different x11 related code. is the 19.05 x11 even more d
Hadn't yet read that far - I plan to test 19.05 soon anyway. Will report.
(I thought the plumbing was - basically - libssh; and, well, ssh itself
is capable of dealing with local displays?)
Tina
On 15/05/2019 15:06, Chris Samuel wrote:
> On 15/5/19 3:01 am, Tina Friedrich wrote:
>
>> Indeed -
On 15/5/19 3:01 am, Tina Friedrich wrote:
Indeed - am I the only person that finds that quite a bit annoying? A
lot of interactive software works a lot better over things like NX, so
why this limitation?
It might be a limitation around the plumbing they use to do this, and
the whole X11 forwa
Indeed - am I the only person that finds that quite a bit annoying? A
lot of interactive software works a lot better over things like NX, so
why this limitation?
Tina
(I realise I'm not adding much the discussion, probably :) )
On 15/05/2019 08:36, Marcus Wagner wrote:
> Dear Mahmood,
>
> ple
Dear Mahmood,
please open a console in the VNC session, do a ssh -Y rocks7 in the
console (yes, relogin to the console) and try it again.
SLURM does not want to use local displays, and a VNC session is a
"local" display, as far as it concerns linux and the X11 subsystem.
So, you need to relogin
Hi Mahmood,
I've never tried using the native X11 of SLURM without being ssh'ed into the
submit node.
Can you try ssh'ing with X11 forwarding to rocks7 (i.e. ssh -X user@rocks7)
from a different machine, and then try your srun --x11 command?
Sean
--
Sean Crosby
Senior DevOpsHPC Engineer and H
>No, but you'll need to logout of rocks7 and ssh back into it.
>Are you physically logged into rocks7? Or are you connecting via SSH?
$DISPLAY = :1 kind of means that you are physically logged into the machine
I am connecting through a vnc session. Right now, I have access to the
desktop of the f
Hi Mahmood,
Are you physically logged into rocks7? Or are you connecting via SSH? $DISPLAY
= :1 kind of means that you are physically logged into the machine
Sean
--
Sean Crosby
Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services
Research Computing | CoEPP | School of Physi
On 5/14/19 5:09 PM, Mahmood Naderan wrote:
Should I modify that parameter on compute-0-0 too?
No, but you'll need to logout of rocks7 and ssh back into it.
Or are you on the console of the system itself?
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
>What does this say?
>echo $DISPLAY
On frontend of compute-0-0?
[mahmood@rocks7 ~]$ echo $DISPLAY
:1
>To get native X11 working with SLURM, we had to add this config to
sshd_config on the login node (your rocks7 host)
>X11UseLocalhost no
>You'll then need to restart sshd
I checked that and it
Hi Mahmood,
To get native X11 working with SLURM, we had to add this config to sshd_config
on the login node (your rocks7 host)
X11UseLocalhost no
You'll then need to restart sshd
Sean
--
Sean Crosby
Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services
Research Computing |
On 5/14/19 4:00 PM, Mahmood Naderan wrote:
srun: error: Cannot forward to local display. Can only use X11
forwarding with network displays.
What does this say?
echo $DISPLAY
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Hi
I think I have asked this question before, but wasn't able to fix that.
While "xclock" command works by "ssh -Y", srun with x11 option fails to
opens xclock.
[mahmood@rocks7 ~]$ srun --x11 --nodelist=compute-0-0 --account y4
--partition RUBY -n 1 -c 4 --mem=1GB xclock
srun: error: Cannot forwa
24 matches
Mail list logo