log files use many strings to identify job, including not having a jobID in the 
message
NUMBER=$SLURM_JOBID
egrep "\.\<$NUMBER\>\] |\<$NUMBER\>\.batch|jobid 
\<$NUMBER\>|JObId=\<$NUMBER\>|job id \<$NUMBER\>|job\.\<$NUMBER\>|job 
\<$NUMBER\>|jobid \[\<$NUMBER\>\]|task_p_slurmd_batch_request: \<$NUMBER\>" 
/var/log/slurm*

Even that misses cruciall data that does not even contain the jobid

[2024-02-03T11:50:33.052] _get_user_env: get env for user jsu here
[2024-02-03T11:52:33.152] timeout waiting for /bin/su to complete
[2024-02-03T11:52:34.152] error: Failed to load current user environment 
variables
[2024-02-03T11:52:34.153] error: _get_user_env: Unable to get user's local 
environment, running only with passed environment

It would be very useful if all messages related to a job had a consistent 
string in them for grepping the log files;
even better might be a command like "scontrol show jobid=NNNN log_messages

But I could not find what I wanted (an easy way to find all daemon log messages 
related to a specific job). I would find it particularly useful if there were a 
way to automatically append such information to the stdout of the job at job 
termination so users would automatically get information about job failures or 
warnings.

Is there such a feature available I have missed?

Sent with [Proton Mail](https://proton.me/) secure email.
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to