On Wed, Mar 11, 2020 at 9:57 PM Chris Samuel wrote:
> If so move it out of the way somewhere safe (just in case) and try again.
>
Aaah, that's a cool find! I never really looked inside my nodes for more
than a year since I debugged all my stuff so it "just works". They are
conjured out of nothin
the
> problem.
>
> -mike
>
>
>
> *Michael Tie*Technical Director
> Mathematics, Statistics, and Computer Science
>
> One North College Street phn: 507-222-4067
> Northfield, MN 55057 cel:952-212-8933
> m...@carleton.edu
On Tue, Mar 10, 2020 at 1:41 PM mike tie wrote:
> Here is the output of lstopo
>
> *$* lstopo -p
>
> Machine (63GB)
>
> Package P#0 + L3 (16MB)
>
> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#0
>
> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#1
>
> L2 (4096KB
Yes, it's odd.
-kkm
On Mon, Mar 9, 2020 at 7:44 AM mike tie wrote:
>
> Interesting. I'm still confused by the where slurmd -C is getting the
> data. When I think of where the kernel stores info about the processor, I
> normally think of /proc/cpuinfo. (by the way, I am running centos 7 in
To answer your direct question, the ground truth of 'slurmctld -C' is what
the kernel thinks the hardware is (what you see in lscpu, except it
probably employs some tricks for VMs with an odd topology). And it got
severely confused by what the kernel reported to it. I know from experience
that cert
I'm running clusters entirely in Google Cloud. I'm not sure I'm
understanding the issue--do the nodes disappear from view entirely only
when they fail to power up by ResumeTimeout? Failures of this kind are
happening in GCE when resources are momentarily unavailable, but the nodes
are still there,
On Thu, Jan 30, 2020 at 7:54 AM Antony Cleave
wrote:
> epilog jobid=513,arraytaskid=4,SLURM_ARRAY_JOB_ID=509,JobState(509)=RUNNING
> epilog jobid=514,arraytaskid=5,SLURM_ARRAY_JOB_ID=509,JobState(509)=RUNNING
> epilog jobid=515,arraytaskid=6,SLURM_ARRAY_JOB_ID=509,JobState(509)=RUNNING
> epilog j