Thanks for the suggestion Ole - I tried this out yesterday with RHEL 9.4 with 
two slightly different setups.  

1) Using the stock ice driver that comes with RHEL 9.4 for the card still saw 
the issue.  

2) There was not a pre-built version of the ice driver on the intel download 
site, so I built it myself, rebooted and re-ran the test.  It did greatly 
reduced the number of occurrences of the issue - but didn't eliminate them.  

This is similar to what I saw on the RHEL 9.3 setup (adding the intel ICE 
driver reduced occurrences but did not eliminate them entirely).

I can also report that the 23.02.7 tree had the similar results on the 9.3 node 
setup.  Going backwards on the slurm bits did not seem to change the number of 
occurrences.  

Unfortunately I think I'm out of time for experiments on these nodes, but maybe 
this thread will be useful to others down the road.

Brent

PS - sorry for my last post getting tagged as s new issue.  Hopefully this one 
will thread correctly.


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to