It could be a problem with ARP cache.
If the number of devices approaches 512, there is a kernel limitation in
dynamic ARP-cache size and it can result in the loss of connectivity
between nodes.
The garbage collector will run if the number of entries in the cache is
less than 128, by default:
*g
Hi Mahmood,
I've never tried using the native X11 of SLURM without being ssh'ed into the
submit node.
Can you try ssh'ing with X11 forwarding to rocks7 (i.e. ssh -X user@rocks7)
from a different machine, and then try your srun --x11 command?
Sean
--
Sean Crosby
Senior DevOpsHPC Engineer and H
Dear Mahmood,
please open a console in the VNC session, do a ssh -Y rocks7 in the
console (yes, relogin to the console) and try it again.
SLURM does not want to use local displays, and a VNC session is a
"local" display, as far as it concerns linux and the X11 subsystem.
So, you need to relogin
On 5/15/19 12:34 AM, Barbara Krašovec wrote:
> It could be a problem with ARP cache.
>
> If the number of devices approaches 512, there is a kernel limitation in
> dynamic
> ARP-cache size and it can result in the loss of connectivity between nodes.
We have 162 compute nodes, a dozen or so file
Indeed - am I the only person that finds that quite a bit annoying? A
lot of interactive software works a lot better over things like NX, so
why this limitation?
Tina
(I realise I'm not adding much the discussion, probably :) )
On 15/05/2019 08:36, Marcus Wagner wrote:
> Dear Mahmood,
>
> ple
Hi;
Do not think "the number of devices" as "the number of servers". If a
devices which have a MAC address and connected to your node's local
networks, it counts as a device. For example, if your BMC ports
(ILO,iDRAC etc.) connected to one of the networks of your nodes, it
doubles the number
On 15-05-2019 09:34, Barbara Krašovec wrote:
It could be a problem with ARP cache.
If the number of devices approaches 512, there is a kernel limitation in
dynamic ARP-cache size and it can result in the loss of connectivity
between nodes.
This is something every cluster owner should be awar
Hi,
I created an account with a Parent with:
$ sudo sacctmgr create account Name=dsi [..] Parent=galilee
Then submitted some jobs in both accounts:
[alainm@gemini ~]$ sacct -n -X -S 01.01.19 -E 05.16.19 -o CPUTimeRAW,Account
-A dsi,galilee | wc -l
18300
[alainm@gemini ~]$ sacct -n -X -S 01.01.1
On 15/5/19 3:01 am, Tina Friedrich wrote:
Indeed - am I the only person that finds that quite a bit annoying? A
lot of interactive software works a lot better over things like NX, so
why this limitation?
It might be a limitation around the plumbing they use to do this, and
the whole X11 forwa
Hadn't yet read that far - I plan to test 19.05 soon anyway. Will report.
(I thought the plumbing was - basically - libssh; and, well, ssh itself
is capable of dealing with local displays?)
Tina
On 15/05/2019 15:06, Chris Samuel wrote:
> On 15/5/19 3:01 am, Tina Friedrich wrote:
>
>> Indeed -
hi all,
we are currently also going through the painful process of making x11
support userfriendly, so i'm also in favour of making this work from eg
vnc or nx/x2go.
however, we now run 17.11.8, and we already noticed that 17.11.11 has
very different x11 related code. is the 19.05 x11 even more d
On 5/15/19 7:32 AM, Tina Friedrich wrote:
Hadn't yet read that far - I plan to test 19.05 soon anyway. Will report.
Cool, Tim has ripped out all the libssh code (which caused me issues at
${JOB-1} because it didn't play nicely with SSH keep alive messages) and
replaced it with native handling
Hi,
I am trying to make sense of the following session:
[alainm@gemini ~]$ sacctmgr list account name=child1
AccountDescr Org
--
child1 child1 parent1
[alainm@gemini ~]$ sacctmg
- On 15 Mai 19, at 19:52, Alain O' Miniussi alain.miniu...@oca.eu wrote:
> Hi,
>
> I am trying to make sense of the following session:
>
>
> [alainm@gemini ~]$ sacctmgr list account name=child1
> AccountDescr Org
> -- -
>please open a console in the VNC session, do a ssh -Y rocks7 in the
console (yes, relogin to the console) and try it again.
>SLURM does not want to use local displays, and a VNC session is a "local"
display, as far as it concerns linux and the X11 >subsystem.
>So, you need to relogin or login to a
>Can you try ssh'ing with X11 forwarding to rocks7 (i.e. ssh -X user@rocks7)
from a different machine, and then try your srun >--x11 command?
No... This doesn't work either. The error is
X11 forwarding not available.
Please see the picture at https://pasteboard.co/IeQGNOx.png
Regards,
Mahmood
>A
>lot of interactive software works a lot better over things like NX, so
>why this limitation?
Agreed... Slurm is a very powerful job manager and I really appreciate its
capabilities. However, I don't know why x11 has been always a pain for
that? spank-x11 was good but that was not a builtin fe
On 5/15/19 11:36 AM, Mahmood Naderan wrote:
I really like to know why x11 is not so friendly? For example, slurm
works with MPI. Why not with X11?!
Because MPI support is fundamental, X11 support is nice to have.
I suspect 19.05 will make your life an awful lot easier!
All the best,
Chris
--
18 matches
Mail list logo