Hi

I made some progress trying to understand the problem i reported some weeks ago:


https://lists.schedmd.com/pipermail/slurm-users/2023-May/010027.html


I noticed that the intermittent connection timeout that i am experiencing 
occurs only

when using the tcp based direct connection to establish communication between 
stepd

on different nodes.

When disabling the optimized direct connection using


export SLURM_PMIX_DIRECT_CONN=false


the submission of hetjobs is stable and not

connection timeout occurs anymore.

Any idea what can goes wrong when using tcp based direct connection together 
with hetjobs?

Cheers,
Denis

---------
Denis Bertini
Abteilung: CIT
Ort: SB3 2.265a

Tel: +49 6159 71 2240
Fax: +49 6159 71 2986
E-Mail: d.bert...@gsi.de

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the GSI Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
Ministerialdirigent Dr. Volkmar Dietz

Reply via email to