Over the past couple month's we have run into issues with jobs on random nodes staying in a 't' status. The only way to resolve it is to restart the node which makes users who run array and MPI jobs frustrated. I am not seeing anything in the logs to indicate an issue. It is using the Berkley database and I was wondering if that could be causing the issue? As in some maintenance needs to be done to it to keep it running smoothly?
Thanks.
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users