Hi Akshay,

Would it possible for you to provide the source to reproduce the issue?

Yes, I've appended the file.


Kind regards

Siegmar



Thanks

On Tue, Mar 21, 2017 at 9:52 AM, Sylvain Jeaugey <sjeau...@nvidia.com 
<mailto:sjeau...@nvidia.com>> wrote:

    Hi Siegmar,

    I think this "NVIDIA : ..." error message comes from the fact that you add 
CUDA includes in the C*FLAGS. If you just use --with-cuda, Open MPI will compile
    with CUDA support, but hwloc will not find CUDA and that will be fine. 
However, setting CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support
    (which is not needed) and then NVML will show this error message when not 
run on a machine with CUDA devices.

    I guess gcc picks the environment variable, while cc does not hence the 
different behavior. So again, there is no need to add all those CUDA includes,
    --with-cuda is enough.

    About the opal_list_remove_item, we'll try to reproduce the issue and see 
where it comes from.

    Sylvain


    On 03/21/2017 12:38 AM, Siegmar Gross wrote:

        Hi,

        I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise Server
        12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get once
        more a warning about a missing item for one of my small programs (it
        doesn't matter if I use my cc or gcc version). My gcc version also
        displays the message "NVIDIA: no NVIDIA devices found" for the server
        without NVIDIA devices (I don't get the message for my cc version).
        I used the following commands to build the package (${SYSTEM_ENV}
        is Linux and ${MACHINE_ENV} is x86_64).


        mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
        cd openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc

        ../openmpi-2.1.0rc4/configure \
          --prefix=/usr/local/openmpi-2.1.0_64_cc \
          --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \
          --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
          --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
          JAVA_HOME=/usr/local/jdk1.8.0_66 \
          LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack -L/usr/local/lib64 
-L/usr/local/cuda/
        lib64" \
          CC="cc" CXX="CC" FC="f95" \
          CFLAGS="-m64 -mt -I/usr/local/include -I/usr/local/cuda/include" \
          CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include" \
          FCFLAGS="-m64" \
          CPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \
          CXXCPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \
          --enable-mpi-cxx \
          --enable-cxx-exceptions \
          --enable-mpi-java \
          --with-cuda=/usr/local/cuda \
          --with-valgrind=/usr/local/valgrind \
          --enable-mpi-thread-multiple \
          --with-hwloc=internal \
          --without-verbs \
          --with-wrapper-cflags="-m64 -mt" \
          --with-wrapper-cxxflags="-m64" \
          --with-wrapper-fcflags="-m64" \
          --with-wrapper-ldflags="-mt" \
          --enable-debug \
          |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

        make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc
        rm -r /usr/local/openmpi-2.1.0_64_cc.old
        mv /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old
        make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc
        make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc


        Sometimes everything works as expected.

        loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm
        Parent process 0: I create 2 slave processes

        Parent process 0 running on loki
            MPI_COMM_WORLD ntasks:              1
            COMM_CHILD_PROCESSES ntasks_local:  1
            COMM_CHILD_PROCESSES ntasks_remote: 2
            COMM_ALL_PROCESSES ntasks:          3
            mytid in COMM_ALL_PROCESSES:        0

        Child process 0 running on nfs1
            MPI_COMM_WORLD ntasks:              2
            COMM_ALL_PROCESSES ntasks:          3
            mytid in COMM_ALL_PROCESSES:        1

        Child process 1 running on nfs2
            MPI_COMM_WORLD ntasks:              2
            COMM_ALL_PROCESSES ntasks:          3
            mytid in COMM_ALL_PROCESSES:        2



        More often I get a warning.

        loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm
        Parent process 0: I create 2 slave processes

        Parent process 0 running on loki
            MPI_COMM_WORLD ntasks:              1
            COMM_CHILD_PROCESSES ntasks_local:  1
            COMM_CHILD_PROCESSES ntasks_remote: 2
            COMM_ALL_PROCESSES ntasks:          3
            mytid in COMM_ALL_PROCESSES:        0

        Child process 0 running on nfs1
            MPI_COMM_WORLD ntasks:              2
            COMM_ALL_PROCESSES ntasks:          3

        Child process 1 running on nfs2
            MPI_COMM_WORLD ntasks:              2
            COMM_ALL_PROCESSES ntasks:          3
            mytid in COMM_ALL_PROCESSES:        2
            mytid in COMM_ALL_PROCESSES:        1
         Warning :: opal_list_remove_item - the item 0x25a76f0 is not on the 
list 0x7f96db515998
        loki spawn 144



        I would be grateful, if somebody can fix the problem. Do you need 
anything
        else? Thank you very much for any help in advance.


        Kind regards

        Siegmar
        _______________________________________________
        users mailing list
        users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
        https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>



    
-----------------------------------------------------------------------------------
    This email message is for the sole use of the intended recipient(s) and may 
contain
    confidential information.  Any unauthorized review, use, disclosure or 
distribution
    is prohibited.  If you are not the intended recipient, please contact the 
sender by
    reply email and destroy all copies of the original message.
    
-----------------------------------------------------------------------------------

    _______________________________________________
    users mailing list
    users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
    https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>




--
-Akshay


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

#define NUM_SLAVES	2		/* create NUM_SLAVES processes	*/


int main (int argc, char *argv[])
{
  MPI_Comm COMM_ALL_PROCESSES,		/* intra-communicator		*/
	   COMM_CHILD_PROCESSES,	/* inter-communicator		*/
	   COMM_PARENT_PROCESSES;	/* inter-communicator		*/
  int	   ntasks_world,		/* # of tasks in MPI_COMM_WORLD	*/
	   ntasks_local,		/* COMM_CHILD_PROCESSES local	*/
	   ntasks_remote,		/* COMM_CHILD_PROCESSES remote	*/
	   ntasks_all,			/* tasks in COMM_ALL_PROCESSES	*/
	   mytid_world,			/* my task id in MPI_COMM_WORLD	*/
	   mytid_all,			/* id in COMM_ALL_PROCESSES	*/
	   namelen;			/* length of processor name	*/
  char	   processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid_world);
  /* At first we must decide if this program is executed from a parent
   * or child process because only a parent is allowed to spawn child
   * processes (otherwise the child process with rank 0 would spawn
   * itself child processes and so on). "MPI_Comm_get_parent ()"
   * returns the parent inter-communicator for a spawned MPI rank and
   * MPI_COMM_NULL if the process wasn't spawned, i.e. it was started
   * statically via "mpiexec" on the command line.
   */
  MPI_Comm_get_parent (&COMM_PARENT_PROCESSES);
  if (COMM_PARENT_PROCESSES == MPI_COMM_NULL)
  {
    /* All parent processes must call "MPI_Comm_spawn ()" but only
     * the root process (in our case the process with rank 0) will
     * spawn child processes. All other processes of the
     * intra-communicator (in our case MPI_COMM_WORLD) will ignore
     * the values of all arguments before the "root" parameter.
     */
    if (mytid_world == 0)
    {
      printf ("Parent process 0: I create %d slave processes\n",
	      NUM_SLAVES);
    }
    MPI_Comm_spawn (argv[0], MPI_ARGV_NULL, NUM_SLAVES,
		    MPI_INFO_NULL, 0, MPI_COMM_WORLD,
		    &COMM_CHILD_PROCESSES, MPI_ERRCODES_IGNORE);
  }
  /* Merge all processes into one intra-communicator. The "high" flag
   * determines the order of the processes in the intra-communicator.
   * If parent and child processes use the same flag the order may
   * be arbitray otherwise the processes with "high == 0" will have
   * a lower rank than the processes with "high == 1".
   */
  if (COMM_PARENT_PROCESSES == MPI_COMM_NULL)
  {
    /* parent processes							*/
    MPI_Intercomm_merge (COMM_CHILD_PROCESSES, 0, &COMM_ALL_PROCESSES);
  }
  else
  {
    /* spawned child processes						*/
    MPI_Intercomm_merge (COMM_PARENT_PROCESSES, 1, &COMM_ALL_PROCESSES);
  }
  MPI_Comm_size	(MPI_COMM_WORLD, &ntasks_world);
  MPI_Comm_size (COMM_ALL_PROCESSES, &ntasks_all);
  MPI_Comm_rank (COMM_ALL_PROCESSES, &mytid_all);
  MPI_Get_processor_name (processor_name, &namelen);
  /* With the following printf-statement every process executing this
   * code will print some lines on the display. It may happen that the
   * lines will get mixed up because the display is a critical section.
   * In general only one process (mostly the process with rank 0) will
   * print on the display and all other processes will send their
   * messages to this process. Nevertheless for debugging purposes
   * (or to demonstrate that it is possible) it may be useful if every
   * process prints itself.
   */
  if (COMM_PARENT_PROCESSES == MPI_COMM_NULL)
  {
    MPI_Comm_size	 (COMM_CHILD_PROCESSES, &ntasks_local);
    MPI_Comm_remote_size (COMM_CHILD_PROCESSES, &ntasks_remote);
    printf ("\nParent process %d running on %s\n"
	    "    MPI_COMM_WORLD ntasks:              %d\n"
	    "    COMM_CHILD_PROCESSES ntasks_local:  %d\n"
	    "    COMM_CHILD_PROCESSES ntasks_remote: %d\n"
	    "    COMM_ALL_PROCESSES ntasks:          %d\n"
	    "    mytid in COMM_ALL_PROCESSES:        %d\n",
	    mytid_world, processor_name, ntasks_world, ntasks_local,
	    ntasks_remote, ntasks_all, mytid_all);
  }
  else
  {
    printf ("\nChild process %d running on %s\n"
	    "    MPI_COMM_WORLD ntasks:              %d\n"
	    "    COMM_ALL_PROCESSES ntasks:          %d\n"
	    "    mytid in COMM_ALL_PROCESSES:        %d\n",
	    mytid_world, processor_name, ntasks_world, ntasks_all,
	    mytid_all);
  }
  MPI_Finalize ();
  return EXIT_SUCCESS;
}
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to