Dear Kenneth,

I do not run the program on the master node. I use SLURM to distribute it
to the slave nodes, so it is fine with me to recompile PySCF (and its
dependencies) in one of the slave nodes.

I'll be back when all the re-builds finish. Thank you very much for all
your help.

Best,
Agustín

El jue, 3 jun 2021 a las 16:06, Kenneth Hoste (<[email protected]>)
escribió:

> On 03/06/2021 20:46, Agustín Aucar wrote:
> > Dear Kenneth,
> >
> > Thank you so much for your kind reply.
> >
> >
> > El jue, 3 jun 2021 a las 15:18, Kenneth Hoste (<[email protected]
> > <mailto:[email protected]>>) escribió:
> >
> >     Dear Agustín,
> >
> >     I'm not sure if there's an easy way to determine which library is
> >     causing the "Illegal instruction" error, but it's possibly not a
> single
> >     specific library, but several...
> >
> >     I suggest you try re-installing all modules on the slave nodes (the
> >     oldest CPUs), if that's feasible.
> >
> >
> > I think the oldest CPUs are not those of the slave nodes but from the
> > master.
> >
> > Master node:
> >
> > model name : Dual-Core AMD Opteron(tm) Processor 2214
> >
> > Slaves:
> >
> > model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
> >
> > As far as I can see in this web site
> > <http://cpuboss.com/cpus/Intel-Xeon-E5-2620-vs-AMD-Opteron-2214>, the
> > AMD CPU (our master node) is older than the Intel ones (slaves). Am I
> wrong?
>
>
> No, you're right, I overlooked that.
>
> This probably means you're in trouble, in some sense...
>
> The AMD processor supports instructions that the Intel one in the slaves
> doesn't support, and vice versa.
>
> So building on the slaves with -march=native (which is what EasyBuild
> does by default) means the installations can only be used on the slaves.
> And the same goes for the master...
>
>
> >
> >     When you use "eb --force", only the easyconfig files specified to
> >     the eb
> >     command are reinstalled.
> >     There's no command line option to re-install everything, since it's
> >     pretty rare to actually having to do this.
> >
> >     The easiest way would be to remove the module files, and then
> reinstall
> >     PySCF with "eb --robot".
> >
> >
> > OK. Then, I will remove all *.lua files (from the 36 modules, including
> > *foss* and *GCCcore*), and then reinstall all of them but from a slave
> node.
> >
> >
> > I will report my results.
> >
> > Thank you for your valuable advice!
> >
> > Agustín
> >
> >     regards,
> >
> >     Kenneth
> >
> >     On 03/06/2021 19:48, Agustín Aucar wrote:
> >      > Dear EasyBuild experts,
> >      >
> >      > I tried to recompile some of the dependencies of the PySCF code
> >     by using:
> >      >
> >      > eb name-of-file.eb --optarch=GENERIC -r --force
> >      >
> >      > but the results are still the same. I recompiled 5 or 6 of the 36
> >      > "dependent" modules... Is there a way to somehow estimate which
> >     module
> >      > is causing this problem to avoid recompiling each of the 36
> modules?
> >      >
> >      > The loaded modules (module purge && module
> >      > load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6) are
> >      >
> >      > Currently Loaded Modules:
> >      >    1) compiler/GCCcore/10.2.0                  10)
> >      > lib/libevent/2.1.12-GCCcore-10.2.0   19) toolchain/foss/2020b
> >      >       28) lib/pybind11/2.6.0-GCCcore-10.2.0
> >      >    2) lib/zlib/1.2.11-GCCcore-10.2.0           11)
> >      > lib/UCX/1.9.0-GCCcore-10.2.0         20)
> >      > tools/bzip2/1.0.8-GCCcore-10.2.0    29)
> >     lang/SciPy-bundle/2020.11-foss-2020b
> >      >    3) tools/binutils/2.35-GCCcore-10.2.0       12)
> >      > lib/libfabric/1.11.0-GCCcore-10.2.0  21)
> >      > devel/ncurses/6.2-GCCcore-10.2.0    30)
> >     tools/Szip/2.1.1-GCCcore-10.2.0
> >      >    4) compiler/GCC/10.2.0                      13)
> >      > lib/PMIx/3.1.5-GCCcore-10.2.0        22)
> >      > lib/libreadline/8.0-GCCcore-10.2.0  31)
> data/HDF5/1.10.7-gompi-2020b
> >      >    5) tools/numactl/2.0.13-GCCcore-10.2.0      14)
> >      > mpi/OpenMPI/4.0.5-GCC-10.2.0         23)
> >     lang/Tcl/8.6.10-GCCcore-10.2.0
> >      >       32) data/h5py/3.1.0-foss-2020b
> >      >    6) tools/XZ/5.2.5-GCCcore-10.2.0            15)
> >      > numlib/OpenBLAS/0.3.12-GCC-10.2.0    24)
> >      > devel/SQLite/3.33.0-GCCcore-10.2.0  33)
> >      > chem/qcint/4.0.6-foss-2020b-Python-3.8.6
> >      >    7) lib/libxml2/2.9.10-GCCcore-10.2.0        16)
> >     toolchain/gompi/2020b
> >      >                 25) math/GMP/6.2.0-GCCcore-10.2.0       34)
> >      > chem/libxc/5.1.3-GCC-10.2.0
> >      >    8) system/libpciaccess/0.16-GCCcore-10.2.0  17)
> >      > numlib/FFTW/3.3.8-gompi-2020b        26)
> >     lib/libffi/3.3-GCCcore-10.2.0
> >      >      35) chem/XCFun/2.1.1-GCCcore-10.2.0
> >      >    9) system/hwloc/2.2.0-GCCcore-10.2.0        18)
> >      > numlib/ScaLAPACK/2.1.0-gompi-2020b   27)
> >      > lang/Python/3.8.6-GCCcore-10.2.0    36)
> >      > chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6
> >      >
> >      >
> >      > Thank you in advance for any help,
> >      > Agustín
> >      >
> >      > El jue, 3 jun 2021 a las 8:03, Agustín Aucar
> >     (<[email protected] <mailto:[email protected]>
> >      > <mailto:[email protected] <mailto:[email protected]>>>)
> escribió:
> >      >
> >      >     Dear Åke and Kenneth,,
> >      >
> >      >     Thank you very much for your replies.
> >      >
> >      >     El jue, 3 jun 2021 a las 4:00, Kenneth Hoste
> >      >     (<[email protected] <mailto:[email protected]>
> >     <mailto:[email protected] <mailto:[email protected]>>>)
> >     escribió:
> >      >
> >      >         Dear Agustín,
> >      >
> >      >         The fundemental problem is indeed that you're building
> >     software
> >      >         on one
> >      >         type of CPU, and then trying to run it on another.
> >      >
> >      >         Can you share some more details on what type of CPU is in
> the
> >      >         master
> >      >         node and slave nodes?
> >      >
> >      >         If you can, try using the archspec tool (see
> >      > https://github.com/archspec/archspec
> >     <https://github.com/archspec/archspec>
> >      >         <https://github.com/archspec/archspec
> >     <https://github.com/archspec/archspec>>, install with "pip3 install
> >      >         archspec", then run "archspec cpu").
> >      >
> >      >         Or share the output of the following commands:
> >      >
> >      >         grep 'model name' /proc/cpuinfo  | head -1
> >      >
> >      >
> >      >         grep flags /proc/cpuinfo | head -1
> >      >
> >      >
> >      >     Master node:
> >      >
> >      >     model name : Dual-Core AMD Opteron(tm) Processor 2214
> >      >
> >      >     flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca
> >      >     cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
> >      >     fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid
> extd_apicid
> >      >     pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
> >     3dnowprefetch vmmcall
> >      >
> >      >
> >      >     Slaves:
> >      >
> >      >     model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
> >      >
> >      >     flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca
> >      >     cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> >      >     syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs
> bts
> >      >     rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni
> >     pclmulqdq
> >      >     dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16
> >     xtpr pdcm
> >      >     pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
> aes
> >      >     xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault
> epb
> >      >     cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb
> stibp
> >      >     tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase
> tsc_adjust
> >      >     bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx
> >     smap
> >      >     intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total
> >     cqm_mbm_local
> >      >     dtherm ida arat pln pts md_clear flush_l1d
> >      >
> >      >         You can also try controlling the optimizations that
> EasyBuild
> >      >         does by
> >      >         default, to prevent that it builds for the specific CPU
> >     in the
> >      >         build
> >      >         node, using "eb --optarch=GENERIC", see
> >      >
> >
> https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html
> >     <
> https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html
> >
> >      >
> >       <
> https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html
> <
> https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html
> >>.
> >      >
> >      >
> >      >     I tried doing
> >      >
> >      >     eb PySCF-2.0.0a-foss-2020b-Python-3.8.6.eb --optarch=GENERIC
> >     -r --force
> >      >
> >      >     but the problem is still the same. Maybe the problem is not
> >     in this
> >      >     particular code (PySCF) but in some of its dependencies. Is
> there
> >      >     something like a "--force" flag to force dependencies to
> >     recompile?
> >      >
> >      >         George's suggestion is better/easier though: building on
> the
> >      >         oldest node
> >      >         should help you too...
> >      >
> >      >
> >      >     I tried this a couple of days ago, but it didn't resolve the
> >      >     problem. In fact: when doing so, I cannot run the code in
> >     master (as
> >      >     expected) but I can neither run it in slaves...
> >      >
> >      >         regards,
> >      >
> >      >         Kenneth
> >      >
> >      >
> >      >
> >      >     Thank you for your help!
> >      >
> >      >     Agustín
> >      >
> >      >         On 02/06/2021 22:20, Agustín Aucar wrote:
> >      >          > Dear George,
> >      >          >
> >      >          > Thanks for your response. A few days ago, I tried to
> >     compile
> >      >         the code in
> >      >          > a slave node, but it didn't solve the problem...
> >      >          >
> >      >          > Best,
> >      >          > Agustín
> >      >          >
> >      >          > El mié, 2 jun 2021 a las 11:41, George Tsouloupas
> >      >          > (<[email protected]
> >     <mailto:[email protected]> <mailto:[email protected]
> >     <mailto:[email protected]>>
> >      >         <mailto:[email protected]
> >     <mailto:[email protected]>
> >      >         <mailto:[email protected]
> >     <mailto:[email protected]>>>>) escribió:
> >      >          >
> >      >          >     Hi,
> >      >          >
> >      >          >     In a similar situation we ended up just building
> the
> >      >         software on the
> >      >          >     "older" cpu (i.e. the "slave" in your case)
> >      >          >
> >      >          >     G.
> >      >          >
> >      >          >
> >      >          >     George Tsouloupas, PhD
> >      >          >     HPC Facility Technical Director
> >      >          >     The Cyprus Institute
> >      >          >     tel: +357 22208688
> >      >          >
> >      >          >     On 6/2/21 4:22 PM, Agustín Aucar wrote:
> >      >          >>     Dear EasyBuild experts,
> >      >          >>
> >      >          >>     Firstly, thank you for your very nice work!
> >      >          >>
> >      >          >>     I'm trying to compile PySCF with the
> >     following *.eb file:
> >      >          >>
> >      >          >>     easyblock = 'CMakeMakeCp'
> >      >          >>
> >      >          >>     name = 'PySCF'
> >      >          >>     version = '2.0.0a'
> >      >          >>     versionsuffix = '-Python-%(pyver)s'
> >      >          >>
> >      >          >>     homepage = 'http://www.pyscf.org
> >     <http://www.pyscf.org> <http://www.pyscf.org <http://www.pyscf.org>>
> >      >         <http://www.pyscf.org/ <http://www.pyscf.org/>
> >     <http://www.pyscf.org/ <http://www.pyscf.org/>>>'
> >      >          >>     description = "PySCF is an open-source collection
> of
> >      >         electronic
> >      >          >>     structure modules powered by Python."
> >      >          >>
> >      >          >>     toolchain = {'name': 'foss', 'version': '2020b'}
> >      >          >>
> >      >          >>     source_urls =
> >     ['https://github.com/pyscf/pyscf/archive/
> >     <https://github.com/pyscf/pyscf/archive/>
> >      >         <https://github.com/pyscf/pyscf/archive/
> >     <https://github.com/pyscf/pyscf/archive/>>
> >      >          >>     <https://github.com/pyscf/pyscf/archive/
> >     <https://github.com/pyscf/pyscf/archive/>
> >      >         <https://github.com/pyscf/pyscf/archive/
> >     <https://github.com/pyscf/pyscf/archive/>>>']
> >      >          >>     sources = ['v%(version)s.tar.gz']
> >      >          >>     checksums =
> >      >          >>
> >      >
> >
>  ['20f4c9faf65436a97f9dfc8099d3c79b988b0a2c5374c701fbe35abc6fad4922']
> >      >          >>
> >      >          >>     builddependencies = [('CMake', '3.18.4')]
> >      >          >>
> >      >          >>     dependencies = [
> >      >          >>         ('Python', '3.8.6'),
> >      >          >>         ('SciPy-bundle', '2020.11'),  # for numpy,
> scipy
> >      >          >>         ('h5py', '3.1.0'),
> >      >          >>         ('qcint', '4.0.6', versionsuffix),
> >      >          >>         ('libxc', '5.1.3'),
> >      >          >>         ('XCFun', '2.1.1'),
> >      >          >>     ]
> >      >          >>
> >      >          >>     start_dir = 'pyscf/lib'
> >      >          >>
> >      >          >>     separate_build_dir = True
> >      >          >>
> >      >          >>     configopts = "-DBUILD_LIBCINT=OFF
> -DBUILD_LIBXC=OFF
> >      >          >>     -DBUILD_XCFUN=OFF "
> >      >          >>
> >      >          >>     prebuildopts = "export
> >      >          >>
> >       PYSCF_INC_DIR=$EBROOTQCINT/include:$EBROOTLIBXC/lib && "
> >      >          >>
> >      >          >>     files_to_copy = ['pyscf']
> >      >          >>
> >      >          >>     sanity_check_paths = {
> >      >          >>         'files': ['pyscf/__init__.py'],
> >      >          >>         'dirs': ['pyscf/data', 'pyscf/lib'],
> >      >          >>     }
> >      >          >>
> >      >          >>     sanity_check_commands = ["python -c 'import
> pyscf'"]
> >      >          >>
> >      >          >>     modextrapaths = {'PYTHONPATH': '',
> >     'PYSCF_EXT_PATH': ''}
> >      >          >>
> >      >          >>     moduleclass = 'chem'
> >      >          >>
> >      >          >>
> >      >          >>     Even if the module is created, I am having
> >     troubles by
> >      >         running it
> >      >          >>     in a node different from master. In particular,
> >     when I
> >      >         load the
> >      >          >>     module and ran the code, it goes all OK:
> >      >          >>
> >      >          >>     module load
> chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6
> >      >          >>     python
> >      >          >>     from pyscf import gto, scf
> >      >          >>     mol = gto.M(atom='H 0 0 0; H 0 0 1')
> >      >          >>     mf = scf.RHF(mol).run()
> >      >          >>
> >      >          >>     but when I try to run it on a node different from
> the
> >      >         master, I get:
> >      >          >>
> >      >          >>     Python 3.8.6 (default, Jun  1 2021, 16:43:49)
> >      >          >>     [GCC 10.2.0] on linux
> >      >          >>     Type "help", "copyright", "credits" or "license"
> for
> >      >         more information.
> >      >          >>     >>> from pyscf import gto, scf
> >      >          >>     >>> mol = gto.M(atom='H 0 0 0; H 0 0 1')
> >      >          >>     >>> mf = scf.RHF(mol).run()
> >      >          >>     Illegal instruction (core dumped)
> >      >          >>
> >      >          >>     As far as I read in different places, it seems to
> be
> >      >         related to
> >      >          >>     the different architectures of our master and
> >     slaves nodes.
> >      >          >>
> >      >          >>     If I execute
> >      >          >>
> >      >          >>     grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 |
> tr
> >      >         '[:upper:]'
> >      >          >>     '[:lower:]' | { read FLAGS; OPT="-march=native";
> >     for flag in
> >      >          >>     $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" |
> >     "ssse3"
> >      >         | "fma" |
> >      >          >>     "cx16" | "popcnt" | "avx" | "avx2") OPT+="
> -m$flag";;
> >      >         esac; done;
> >      >          >>     MODOPT=${OPT//_/\.}; echo "$MODOPT"; }
> >      >          >>
> >      >          >>     on the slaves I get: -march=native -mssse3 -mfma
> >     -mcx16
> >      >         -msse4.1
> >      >          >>     -msse4.2 -mpopcnt -mavx -mavx2
> >      >          >>
> >      >          >>     whereas on the master node we have: -march=native
> >     -mcx16
> >      >          >>
> >      >          >>     I tried to compile PySCF by adding these lines to
> my
> >      >         *.eb file:
> >      >          >>
> >      >          >>     configopts += "-DBUILD_FLAGS='-march=native
> -mssse3
> >      >         -mfma -mcx16
> >      >          >>     -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
> >      >          >>     configopts += "-DCMAKE_C_FLAGS='-march=native
> -mssse3
> >      >         -mfma -mcx16
> >      >          >>     -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
> >      >          >>     configopts += "-DCMAKE_CXX_FLAGS='-march=native
> >     -mssse3
> >      >         -mfma
> >      >          >>     -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
> >      >          >>     configopts +=
> "-DCMAKE_FORTRAN_FLAGS='-march=native
> >      >         -mssse3 -mfma
> >      >          >>     -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2'"
> >      >          >>
> >      >          >>     but in that case the code does not run on master
> and
> >      >         neither in
> >      >          >>     slaves.
> >      >          >>
> >      >          >>
> >      >          >>     I'm sorry if it is a stupid question. I am far
> from
> >      >         being a system
> >      >          >>     admin...
> >      >          >>
> >      >          >>     Thanks a lot for your help.
> >      >          >>
> >      >          >>     Dr. Agustín Aucar
> >      >          >>     Institute for Modeling and Innovative
> Technologies -
> >      >         Argentina
> >      >          >
> >      >
> >
>

Reply via email to