log files use many strings to identify job, including not having a jobID in the message NUMBER=$SLURM_JOBID egrep "\.\<$NUMBER\>\] |\<$NUMBER\>\.batch|jobid \<$NUMBER\>|JObId=\<$NUMBER\>|job id \<$NUMBER\>|job\.\<$NUMBER\>|job \<$NUMBER\>|jobid \[\<$NUMBER\>\]|task_p_slurmd_batch_request: \<$NUMBER\>" /var/log/slurm*
Even that misses cruciall data that does not even contain the jobid [2024-02-03T11:50:33.052] _get_user_env: get env for user jsu here [2024-02-03T11:52:33.152] timeout waiting for /bin/su to complete [2024-02-03T11:52:34.152] error: Failed to load current user environment variables [2024-02-03T11:52:34.153] error: _get_user_env: Unable to get user's local environment, running only with passed environment It would be very useful if all messages related to a job had a consistent string in them for grepping the log files; even better might be a command like "scontrol show jobid=NNNN log_messages But I could not find what I wanted (an easy way to find all daemon log messages related to a specific job). I would find it particularly useful if there were a way to automatically append such information to the stdout of the job at job termination so users would automatically get information about job failures or warnings. Is there such a feature available I have missed? Sent with [Proton Mail](https://proton.me/) secure email.
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com