It is a University "supercomputer", not a national facility. Hence they are
not that expert, which is why I am asking here. I am pretty certain that it
is some form of communication issue, but beyond that it is not clear.

If I get suggestions such as "why don't they look for ABC in XYZ" then I
may persuade them to look at specifics. They will need the coaching, alas.

On Wed, Dec 20, 2023 at 1:25 PM Gerhard Strangar <g...@arcor.de> wrote:

> Laurence Marks wrote:
>
> > After some (irreproducible) time, often one of the three slow tasks
> hangs.
> > A symptom is that if I try and ssh into the main node of the subtask
> (which
> > is running 128 mpi on the 4 nodes) I get "Authentication failed".
>
> How about asking an admin to check why it hangs?
>
>

-- 
Emeritus Professor Laurence Marks (Laurie)
Northwestern University
Webpage <http://www.numis.northwestern.edu> and Google Scholar link
<http://scholar.google.com/citations?user=zmHhI9gAAAAJ&hl=en>
"Research is to see what everybody else has seen, and to think what nobody
else has thought", Albert Szent-Györgyi

Reply via email to