From what I know of how this works, no, it’s not getting it from a local file
or the master node. I don’t believe it even makes a network connection, nor
requires a slurm.conf in order to run. If you can run it fresh on a node with
no config and that’s what it comes up with, it’s probably gettin
On 3/12/20 9:37 PM, Kirill 'kkm' Katsnelson wrote:
Aaah, that's a cool find! I never really looked inside my nodes for more
than a year since I debugged all my stuff so it "just works". They are
conjured out of nothing and dissolve back into nothing after 10 minutes
of inactivity. But good to
On Wed, Mar 11, 2020 at 9:57 PM Chris Samuel wrote:
> If so move it out of the way somewhere safe (just in case) and try again.
>
Aaah, that's a cool find! I never really looked inside my nodes for more
than a year since I debugged all my stuff so it "just works". They are
conjured out of nothin
On 10/3/20 1:40 pm, mike tie wrote:
Here is the output of lstopo
Hmm, well I believe Slurm should be using hwloc (which provides lstopo)
to get its information (at least it calls the xcpuinfo_hwloc_topo_get()
function for that), so if lstopo works then slurmd should too.
Ah, looking a bit
Yup, I think if you get stuck so badly, the first thing is to make sure the
node does not get the number 10 from the controller, and the second just
reimage the VM fresh. It maybe not the quickest way, but at least
predictable in the sense of time spent.
Good luck!
-kkm
On Wed, Mar 11, 2020 at
Yep, slurmd -C is obviously getting the data from somewhere, either a local
file or from the master node. hence my email to the group; I was hoping
that someone would just say: "yeah, modify file ". But oh well. I'll
start playing with strace and gdb later this week; looking through the
so
On Tue, Mar 10, 2020 at 1:41 PM mike tie wrote:
> Here is the output of lstopo
>
> *$* lstopo -p
>
> Machine (63GB)
>
> Package P#0 + L3 (16MB)
>
> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#0
>
> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#1
>
> L2 (4096KB
Here is the output of lstopo
*$* lstopo -p
Machine (63GB)
Package P#0 + L3 (16MB)
L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#0
L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#1
L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#2
L2 (4096KB) + L1d
Yes, it's odd.
-kkm
On Mon, Mar 9, 2020 at 7:44 AM mike tie wrote:
>
> Interesting. I'm still confused by the where slurmd -C is getting the
> data. When I think of where the kernel stores info about the processor, I
> normally think of /proc/cpuinfo. (by the way, I am running centos 7 in
On 9/3/20 7:44 am, mike tie wrote:
Specifically, how is slurmd -C getting that info? Maybe this is a
kernel issue, but other than lscpu and /proc/cpuinfo, I don't know where
to look. Maybe I should be looking at the slurmd source?
It would be worth looking at what something like "lstopo" fr
Interesting. I'm still confused by the where slurmd -C is getting the
data. When I think of where the kernel stores info about the processor, I
normally think of /proc/cpuinfo. (by the way, I am running centos 7 in the
vm. The vm hypervisor is VMware). /proc/cpuinfo does show 16 cores.
I unde
To answer your direct question, the ground truth of 'slurmctld -C' is what
the kernel thinks the hardware is (what you see in lscpu, except it
probably employs some tricks for VMs with an odd topology). And it got
severely confused by what the kernel reported to it. I know from experience
that cert
12 matches
Mail list logo