Hi Alan,

Fantastic, adding `--parallel=12` fixed the issue:

$ eb ORCA-5.0.1-gompi-2021a.eb -r --parallel=12

Thanks a lot,
Ole


On 11/15/21 09:59, Alan O'Cais wrote:
Hmm, this has now come up a few times. OpenMPI does not like hyperthreading and only cares about the physical cores. EB is passing the number of cores it sees as the number of required slots. Without oversubscription the example will not run. Either we allow oversubscription, or we figure out a method to quantify the hyperthreading.

There are a few open issues on this, see https://github.com/easybuilders/easybuild-easyblocks/pull/2611 <https://github.com/easybuilders/easybuild-easyblocks/pull/2611> and the linked issues.

For an immediate fix, you just need to limit the number of cores used for the build, e.g. use the eb option `--parallel=12`


On Mon, 15 Nov 2021 at 09:06, Ole Holm Nielsen <[email protected] <mailto:[email protected]>> wrote:

    We use EB 4.5.0 and would like to install this module:

    $ eb ORCA-5.0.1-gompi-2021a.eb -r

    but it fails with:

    == FAILED: Installation ended unsuccessfully (build directory:
    /dev/shm/ORCA/5.0.1/gompi-2021a): build failed (first 300 chars): Sanity
    check failed: sanity check command $EBROOTORCA/bin/orca
    /dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL
    SINGLE POINT ENERGY[    ]*-75.95934031' exited with code 1 (output:
    --------------------------------------------------------------------------
    There are not enough slots (took 1 min 50 secs)
    == Results of the build can be found in the log file(s)
    /tmp/eb-2QJPW_/easybuild-ORCA-5.0.1-20211111.140110.qlMvK.log
    ERROR: Build of
    
/home/modules/software/EasyBuild/4.5.0/easybuild/easyconfigs/o/ORCA/ORCA-5.0.1-gompi-2021a.eb

    failed (err: "build failed (first 300 chars): Sanity check failed: sanity
    check command $EBROOTORCA/bin/orca
    /dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL
    SINGLE POINT ENERGY[ \t]*-75.95934031' exited with code 1 (output:
    
--------------------------------------------------------------------------\nThere

    are not enough slots")


    There are further errors in the logfile:

    == 2021-11-11 14:03:00,669 build_log.py:169 ERROR EasyBuild crashed with
    an error (at easybuild/base/exceptions.py:124 in __init__): Sanity check
    failed: sanity check command $EBROOTORCA/bin/orca
    /dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp | grep -c 'FINAL
    SINGLE POINT ENERGY[  ]*-75.95934031' exited with code 1 (output:
    --------------------------------------------------------------------------
    There are not enough slots available in the system to satisfy the 48
    slots that were requested by the application:

        /home/modules/software/ORCA/5.0.1-gompi-2021a/bin/orca_gtoint_mpi

    Either request fewer slots for your application, or make more slots
    available for use.

    A "slot" is the Open MPI term for an allocatable unit where we can
    launch a process.  The number of slots available are defined by the
    environment in which Open MPI processes are run:

        1. Hostfile, via "slots=N" clauses (N defaults to number of
           processor cores if not provided)
        2. The --host command line parameter, via a ":N" suffix on the
           hostname (N defaults to 1 if not provided)
        3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
        4. If none of a hostfile, the --host command line parameter, or an
           RM is present, Open MPI defaults to the number of processor cores

    In all the above cases, if you want Open MPI to default to the number
    of hardware threads instead of the number of processor cores, use the
    --use-hwthread-cpus option.

    Alternatively, you can use the --oversubscribe option to ignore the
    number of available slots when deciding the number of processes to
    launch.
    --------------------------------------------------------------------------
    [file orca_tools/qcmsg.cpp, line 458]:
        .... aborting the run

    0
    ) (at easybuild/framework/easyblock.py:3311 in _sanity_check_step)


    Question: Why does the ORCA testing request 48 MPI "slots" (MPI tasks I
    suppose) and then fails?

    The build host has two Intel(R) Xeon(R) CPU E5-2650 v4 processors for 48
    cores (including Hyperthreading).

    The ORCA input file /dev/shm/ORCA/5.0.1/gompi-2021a/eb_test_hf_water.inp
    contains:

    !HF DEF2-SVP
    %PAL NPROCS 48 END
    * xyz 0 1
    O   0.0000   0.0000   0.0626
    H  -0.7920   0.0000  -0.4973
    H   0.7920   0.0000  -0.4973
    *

    The user's limits would seem to be sufficient:

    $ ulimit -a
    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) 50000000
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 1030498
    max locked memory       (kbytes, -l) 64
    max memory size         (kbytes, -m) 50000000
    open files                      (-n) 1024
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) unlimited
    cpu time               (seconds, -t) 240000
    max user processes              (-u) 2500
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited


    Thanks for sharing any insights.

-- Ole Holm Nielsen
    PhD, Senior HPC Officer
    Department of Physics, Technical University of Denmark



--
Dr. Alan O'Cais
E-CAM Software Manager
Juelich Supercomputing Centre
Forschungszentrum Juelich GmbH
52425 Juelich, Germany

Phone: +49 2461 61 5213
Fax: +49 2461 61 6656
E-mail: [email protected] <mailto:[email protected]>
WWW: http://www.fz-juelich.de/ias/jsc/EN <http://www.fz-juelich.de/ias/jsc/EN>


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Astrid Lambrecht,
Prof. Dr. Frauke Melchior
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------


Reply via email to