>>>>> "SJ" == Sylvain Jeaugey <sjeau...@nvidia.com> writes:

Hi Silvain,

I get the "NVIDIA : ..." run-time error messages just by compiling
with "--with-cuda=/usr":

./configure --prefix=${prefix} \
    --mandir=${prefix}/share/man \
    --infodir=${prefix}/share/info \
    --sysconfdir=/etc/openmpi/${VERSION} --with-devel-headers \
    --disable-memchecker \
    --disable-vt \
    --with-tm --with-slurm --with-pmi --with-sge \
    --with-cuda=/usr \
    --with-io-romio-flags='--with-file-system=nfs+lustre' \
    --with-cma --without-valgrind \
    --enable-openib-connectx-xrc \
    --enable-orterun-prefix-by-default \
    --disable-java

Roland
    
    SJ> Hi Siegmar, I think this "NVIDIA : ..." error message comes from
    SJ> the fact that you add CUDA includes in the C*FLAGS. If you just
    SJ> use --with-cuda, Open MPI will compile with CUDA support, but
    SJ> hwloc will not find CUDA and that will be fine. However, setting
    SJ> CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support
    SJ> (which is not needed) and then NVML will show this error message
    SJ> when not run on a machine with CUDA devices.

    SJ> I guess gcc picks the environment variable, while cc does not
    SJ> hence the different behavior. So again, there is no need to add
    SJ> all those CUDA includes, --with-cuda is enough.

    SJ> About the opal_list_remove_item, we'll try to reproduce the
    SJ> issue and see where it comes from.

    SJ> Sylvain

    SJ> On 03/21/2017 12:38 AM, Siegmar Gross wrote:
    >> Hi,
    >>
    >> I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise
    >> Server
    >> 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get
    >>      once
    >> more a warning about a missing item for one of my small programs
    >> (it doesn't matter if I use my cc or gcc version). My gcc version
    >> also displays the message "NVIDIA: no NVIDIA devices found" for
    >> the server without NVIDIA devices (I don't get the message for my
    >> cc version).  I used the following commands to build the package
    >> (${SYSTEM_ENV} is Linux and ${MACHINE_ENV} is x86_64).
    >>
    >>
    >> mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd
    >> openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
    >>
    >> ../openmpi-2.1.0rc4/configure \
    >> --prefix=/usr/local/openmpi-2.1.0_64_cc \
    >> --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \
    >> --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
    >> --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
    >> JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z
    >> -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \
    >> CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt -I/usr/local/include
    >> -I/usr/local/cuda/include" \ CXXFLAGS="-m64 -I/usr/local/include
    >> -I/usr/local/cuda/include" \ FCFLAGS="-m64" \ CPP="cpp
    >> -I/usr/local/include -I/usr/local/cuda/include" \ CXXCPP="cpp
    >> -I/usr/local/include -I/usr/local/cuda/include" \
    >> --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \
    >> --with-cuda=/usr/local/cuda \ --with-valgrind=/usr/local/valgrind
    >> \ --enable-mpi-thread-multiple \ --with-hwloc=internal \
    >> --without-verbs \ --with-wrapper-cflags="-m64 -mt" \
    >> --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" \
    >> --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee
    >> log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
    >>
    >> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc rm -r
    >> /usr/local/openmpi-2.1.0_64_cc.old mv
    >> /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old
    >> make install |& tee
    >> log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc make check |& tee
    >> log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc
    >>
    >>
    >> Sometimes everything works as expected.
    >>
    >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2
    >> spawn_intra_comm Parent process 0: I create 2 slave processes
    >>
    >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1
    >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES
    >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in
    >> COMM_ALL_PROCESSES: 0
    >>
    >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2
    >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 1
    >>
    >> Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2
    >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 2
    >>
    >>
    >>
    >> More often I get a warning.
    >>
    >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2
    >> spawn_intra_comm Parent process 0: I create 2 slave processes
    >>
    >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1
    >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES
    >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in
    >> COMM_ALL_PROCESSES: 0
    >>
    >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2
    >> COMM_ALL_PROCESSES ntasks: 3
    >>
    >> Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2
    >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 2 mytid
    >> in COMM_ALL_PROCESSES: 1 Warning :: opal_list_remove_item - the
    >> item 0x25a76f0 is not on the list 0x7f96db515998 loki spawn 144
    >>
    >>
    >>
    >> I would be grateful, if somebody can fix the problem. Do you need
    >> anything else? Thank you very much for any help in advance.
    >>
    >>
    >> Kind regards
    >>
    >> Siegmar _______________________________________________ users
    >> mailing list users@lists.open-mpi.org
    >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


    SJ> 
-----------------------------------------------------------------------------------
    SJ> This email message is for the sole use of the intended
    SJ> recipient(s) and may contain confidential information.  Any
    SJ> unauthorized review, use, disclosure or distribution is
    SJ> prohibited.  If you are not the intended recipient, please
    SJ> contact the sender by reply email and destroy all copies of the
    SJ> original message.
    SJ> 
-----------------------------------------------------------------------------------
    SJ> _______________________________________________ users mailing
    SJ> list users@lists.open-mpi.org
    SJ> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

-- 
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to