Einar, The strings in your $SLURM_JOB_ID values or host names are likely too long to serve as jobid for the Lustre Jobstats feature .
You might try %H instead of %h in jobid_name. For reference, from the Lustre manual, https://doc.lustre.org/lustre_manual.xhtml#jobstats : > %e print executable name > %g print group ID number > %h print fully-qualified hostname > %H print short hostname > %j print JobID from process environment variable named by the jobid_var > parameter > %p print numeric process ID > %u print user ID number On my system (2.12), I use: jobid_var=PBS_JOBID jobid_name=%e.%u I get job_stats by $PBS_JOBID, as expected, from processes that actually have the variable set, and synthetic %e.%u values from all others, like processes on interactive or backup nodes. This has been working just fine to pinpoint the source of occasional trouble. Curiously, I don't think the manual spells out what happens when the variable referenced by jobid_var is unset, i.e., the above fallback logic from jobid_var to jobid_name. With best regards, -- Michael Sternberg, Ph.D. Principal Scientific Computing Administrator Center for Nanoscale Materials Argonne National Laboratory > On Aug 12, 2022, at 03:37, Einar Næss Jensen <[email protected]> > wrote: > logfiles on oss servers are full of these error messages: > Invalid jobid size (37), expect(32) > What does it mean? > > we have set this: > [root@mds-1 ~]# lctl get_param jobid_var jobid_name > jobid_var=SLURM_JOB_ID > jobid_name=%j:%u:%h > > lustre version is 2.12.6(ddn) _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
