Our main file server, after 215 days of up time, started getting processes stuck in Fauth state and it spread to our cpu servers. Restarting our authentication server and the main cpu servers didn't fix it, so I took down the whole complex, turned the machines off for a few minutes and restarted them all.
Things should be back to normal now.