[OMPI users] Test, Am I subscribed?

2017-07-31 Thread Mahmood Naderan
Hello,

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-31 Thread Gilles Gouaillardet

Dave,


unless you are doing direct launch (for example, use 'srun' instead of 
'mpirun' under SLURM),


this is the way Open MPI is working : mpirun will use whatever the 
resource manager provides


in order to spawn the remote orted (tm with PBS, qrsh with SGE, srun 
with SLURM, ...).



then mpirun/orted will fork&exec the MPI tasks.


direct launch provides tightest integration, but it requires some 
capabilities (a PMI(x) server)


are provided by the resource manager.


hopefully the resource manager will report memory consumption and so on 
of the spawned process


(e.g. orted) but also its children (e.g. the MPI tasks)


back to SGE, and if i understand correctly, memory is requested per task 
on the qsub command line.


i am not sure what is done then ... this requirement is either ignored, 
or the requirement is set per orted.


(and once again, i do not know if the limit is only for the orted 
process, or its children too)



Bottom line, unless SGE natively provides PMI(x) capabilities, the 
current "tight integration" is imho the best we can do




Cheers,


Gilles




On 7/28/2017 12:50 AM, Dave Love wrote:

"r...@open-mpi.org"  writes:


Oh no, that's not right. Mpirun launches daemons using qrsh and those
daemons spawn the app's procs. SGE has no visibility of the app at all

Oh no, that's not right.

The whole point of tight integration with remote startup using qrsh is
to report resource usage and provide control over the job.  I'm somewhat
familiar with this.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] test

2017-07-31 Thread Mahmood Naderan
Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] question about run-time of a small program

2017-07-31 Thread Siegmar Gross

Hi,

I have two versions of a small program. In the first one the process with rank 0
calls the function "master()" and all other ranks call the function "slave()" 
and in the second one I have two programs: one for the master task and another

one for the slave task. The run-time for the second version is much bigger than
the one for the first version. Any ideas why the version with two separate
programs takes that long?

loki tmp 108 mpicc -o hello_1_mpi hello_1_mpi.c
loki tmp 109 mpicc -o hello_2_mpi hello_2_mpi.c
loki tmp 110 mpicc -o hello_2_slave_mpi hello_2_slave_mpi.c
loki tmp 111 /usr/bin/time -p mpiexec -np 3 hello_1_mpi
Process 0 of 3 running on loki
Process 1 of 3 running on loki
Process 2 of 3 running on loki
...
real 0.14
user 0.00
sys 0.00
loki tmp 112 /usr/bin/time -p mpiexec -np 1 hello_2_mpi : \
  -np 2 hello_2_slave_mpi
Process 0 of 3 running on loki
Process 1 of 3 running on loki
Process 2 of 3 running on loki
...
real 23.15
user 0.00
sys 0.00
loki tmp 113 ompi_info | grep "Open MPI repo revision"
  Open MPI repo revision: v3.0.0rc2
loki tmp 114


Thank you very much for any answer in advance.


Kind regards

Siegmar
#include 
#include 
#include 
#include 
#include 
#include "mpi.h"

#define	BUF_SIZE	255		/* message buffer size		*/
#define	MAX_TASKS	12		/* max. number of tasks		*/
#define	SENDTAG		1		/* send message command		*/
#define	EXITTAG		2		/* termination command		*/
#define	MSGTAG		3		/* normal message token		*/

#define ENTASKS		-1		/* error: too many tasks	*/

static void master (void);
static void slave (void);

int main (int argc, char *argv[])
{
  int  mytid,/* my task id			*/
   ntasks,/* number of parallel tasks	*/
   namelen;/* length of processor name	*/
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
  MPI_Get_processor_name (processor_name, &namelen);
  /* With the next statement every process executing this code will
   * print one line on the display. It may happen that the lines will
   * get mixed up because the display is a critical section. In general
   * only one process (mostly the process with rank 0) will print on
   * the display and all other processes will send their messages to
   * this process. Nevertheless for debugging purposes (or to
   * demonstrate that it is possible) it may be useful if every
   * process prints itself.
   */
  fprintf (stdout, "Process %d of %d running on %s\n",
	   mytid, ntasks, processor_name);
  fflush (stdout);
  MPI_Barrier (MPI_COMM_WORLD);		/* wait for all other processes	*/

  if (mytid == 0)
  {
master ();
  }
  else
  {
slave ();
  }
  MPI_Finalize ();
  return EXIT_SUCCESS;
}


/* Function for the "master task". The master sends a request to all
 * slaves asking for a message. After receiving and printing the
 * messages he sends all slaves a termination command.
 *
 * input parameters:	not necessary
 * output parameters:	not available
 * return value:	nothing
 * side effects:	no side effects
 *
 */
void master (void)
{
  int		ntasks,			/* number of parallel tasks	*/
		mytid,			/* my task id			*/
		num,			/* number of entries		*/
		i;			/* loop variable		*/
  char		buf[BUF_SIZE + 1];	/* message buffer (+1 for '\0')	*/
  MPI_Status	stat;			/* message details		*/

  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
  if (ntasks > MAX_TASKS)
  {
fprintf (stderr, "Error: Too many tasks. Try again with at most "
	 "%d tasks.\n", MAX_TASKS);
/* terminate all slave tasks	*/
for (i = 1; i < ntasks; ++i)
{
  MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD);
}
MPI_Finalize ();
exit (ENTASKS);
  }
  printf ("\n\nNow %d slave tasks are sending greetings.\n\n",
	  ntasks - 1);
  /* request messages from slave tasks	*/
  for (i = 1; i < ntasks; ++i)
  {
MPI_Send ((char *) NULL, 0, MPI_CHAR, i, SENDTAG, MPI_COMM_WORLD);
  }
  /* wait for messages and print greetings */
  for (i = 1; i < ntasks; ++i)
  {
MPI_Recv (buf, BUF_SIZE, MPI_CHAR, MPI_ANY_SOURCE,
	  MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
MPI_Get_count (&stat, MPI_CHAR, &num);
buf[num] = '\0';			/* add missing end-of-string	*/
printf ("Greetings from task %d:\n"
	"  message type:%d\n"
	"  msg length:  %d characters\n"
	"  message: %s\n\n",
	stat.MPI_SOURCE, stat.MPI_TAG, num, buf);
  }
  /* terminate all slave tasks		*/
  for (i = 1; i < ntasks; ++i)
  {
MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD);
  }
}


/* Function for "slave tasks". The slave task sends its hostname,
 * operating system name and release, and processor architecture
 * as a message to the master.
 *
 * input parameters:	not necessary
 * output parameters:	not available
 * return value:	nothing
 * side effects:	no side effects
 *
 */
void slave (void)
{
  struct utsname

[OMPI users] error building openmpi-v2.* with SUN C 5.15 on SuSE Linux

2017-07-31 Thread Siegmar Gross

Hi,

I've been able to install openmpi-v2.0.x-201707270322-239c439 and
openmpi-v2.x-201707271804-3b1e9fe on my "SUSE Linux Enterprise Server
12.2 (x86_64)" with Sun C 5.14 (Oracle Developer Studio 12.5).
Unfortunately "make" breaks for both versions with the same error,
if I use the latest Sun C 5.15 (Oracle Developer Studio 12.5) compiler.


loki openmpi-v2.0.x-201707270322-239c439-Linux.x86_64.64_cc_12.6 140 tail -18 
log.make.Linux.x86_64.64_cc | head -7
"/export2/src/openmpi-2.0.4/openmpi-v2.0.x-201707270322-239c439/opal/include/opal/sys/x86_64/atomic.h", 
line 270: warning: parameter in inline asm statement unused: %2
"../../../../../../openmpi-v2.0.x-201707270322-239c439/opal/mca/pmix/pmix112/pmix/src/buffer_ops/open_close.c", 
line 51: redeclaration must have the same or more restrictive linker scoping: 
opal_pmix_pmix112_pmix_bfrop
"../../../../../../openmpi-v2.0.x-201707270322-239c439/opal/mca/pmix/pmix112/pmix/src/buffer_ops/open_close.c", 
line 401: redeclaration must have the same or more restrictive linker scoping: 
opal_pmix_pmix112_pmix_value_load
cc: acomp failed for 
../../../../../../openmpi-v2.0.x-201707270322-239c439/opal/mca/pmix/pmix112/pmix/src/buffer_ops/open_close.c

Makefile:1242: recipe for target 'src/buffer_ops/open_close.lo' failed
make[4]: *** [src/buffer_ops/open_close.lo] Error 1
make[4]: Leaving directory 
'/export2/src/openmpi-2.0.4/openmpi-v2.0.x-201707270322-239c439-Linux.x86_64.64_cc_12.6/opal/mca/pmix/pmix112/pmix'

loki openmpi-v2.0.x-201707270322-239c439-Linux.x86_64.64_cc_12.6 141


Perhaps somebody can fix the problem. Thank you very much for your help in
advance.


Kind regards

Siegmar
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] -host vs -hostfile

2017-07-31 Thread Mahmood Naderan
Hi,

I have stuck at a problem which I don't remember that on previous versions.
when I run a test program with -host, it works. I mean, the process spans
to the hosts I specified. However, when I specify -hostfile, it doesn't
work!!

mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun
-host compute-0-0,cluster -np 2 a.out

* hwloc 1.11.2 has encountered what looks like an error from the
operating system.
*
* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset
0xff00) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology script.

Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
Hello world from processor compute-0-0.local, rank 0 out of 2 processors
mahmood@cluster:mpitest$ cat hosts
cluster
compute-0-0

mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun
-hostfile hosts -np 2 a.out

* hwloc 1.11.2 has encountered what looks like an error from the
operating system.
*
* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset
0xff00) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology script.

Hello world from processor cluster.hpc.org, rank 0 out of 2 processors
Hello world from processor cluster.hpc.org, rank 1 out of 2 processors


how can I resolve that?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread Elken, Tom
Hi Mahmood,

With the -hostfile case, Open MPI is trying to helpfully run things faster by 
keeping both processes on one host.  Ways to avoid this…

On the mpirun command line add:

-pernode  (runs 1 process per node), oe
-npernode 1 ,   but these two has been deprecated in favor of the wonderful 
syntax:
--map-by ppr:1:node

Or you could change your hostfile to:

cluster slots=1

compute-0-0 slots=1


-Tom

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Mahmood 
Naderan
Sent: Monday, July 31, 2017 6:47 AM
To: Open MPI Users 
Subject: [OMPI users] -host vs -hostfile

Hi,

I have stuck at a problem which I don't remember that on previous versions. 
when I run a test program with -host, it works. I mean, the process spans to 
the hosts I specified. However, when I specify -hostfile, it doesn't work!!

mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -host 
compute-0-0,cluster -np 2 a.out



* hwloc 1.11.2 has encountered what looks like an error from the operating 
system.

*

* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
0xff00) without inclusion!

* Error occurred in topology.c line 1048

*

* The following FAQ entry in the hwloc documentation may help:

*   What should I do when hwloc reports "operating system" warnings?

* Otherwise please report this error message to the hwloc user's mailing list,

* along with the output+tarball generated by the hwloc-gather-topology script.



Hello world from processor cluster.hpc.org, rank 1 out 
of 2 processors

Hello world from processor compute-0-0.local, rank 0 out of 2 processors

mahmood@cluster:mpitest$ cat hosts

cluster

compute-0-0



mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun 
-hostfile hosts -np 2 a.out



* hwloc 1.11.2 has encountered what looks like an error from the operating 
system.

*

* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
0xff00) without inclusion!

* Error occurred in topology.c line 1048

*

* The following FAQ entry in the hwloc documentation may help:

*   What should I do when hwloc reports "operating system" warnings?

* Otherwise please report this error message to the hwloc user's mailing list,

* along with the output+tarball generated by the hwloc-gather-topology script.



Hello world from processor cluster.hpc.org, rank 0 out 
of 2 processors

Hello world from processor cluster.hpc.org, rank 1 out 
of 2 processors


how can I resolve that?
Regards,
Mahmood

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread Mahmood Naderan
OK. The next question is how touse it with torque (PBS)? currently we write
this directive

Nodes=1:ppn=2

which means 4 threads. Then we omit -np and -hostfile in the mpirun command.

On 31 Jul 2017 20:24, "Elken, Tom"  wrote:

> Hi Mahmood,
>
>
>
> With the -hostfile case, Open MPI is trying to helpfully run things faster
> by keeping both processes on one host.  Ways to avoid this…
>
>
>
> On the mpirun command line add:
>
>
>
> -pernode  (runs 1 process per node), oe
>
> -npernode 1 ,   but these two has been deprecated in favor of the
> wonderful syntax:
>
> --map-by ppr:1:node
>
>
>
> Or you could change your hostfile to:
>
> cluster slots=1
>
> compute-0-0 slots=1
>
>
>
>
>
> -Tom
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Mahmood
> Naderan
> *Sent:* Monday, July 31, 2017 6:47 AM
> *To:* Open MPI Users 
> *Subject:* [OMPI users] -host vs -hostfile
>
>
>
> Hi,
>
> I have stuck at a problem which I don't remember that on previous
> versions. when I run a test program with -host, it works. I mean, the
> process spans to the hosts I specified. However, when I specify -hostfile,
> it doesn't work!!
>
> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -host 
> compute-0-0,cluster -np 2 a.out
>
> 
>
> * hwloc 1.11.2 has encountered what looks like an error from the operating 
> system.
>
> *
>
> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
> 0xff00) without inclusion!
>
> * Error occurred in topology.c line 1048
>
> *
>
> * The following FAQ entry in the hwloc documentation may help:
>
> *   What should I do when hwloc reports "operating system" warnings?
>
> * Otherwise please report this error message to the hwloc user's mailing list,
>
> * along with the output+tarball generated by the hwloc-gather-topology script.
>
> 
>
> Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
>
> Hello world from processor compute-0-0.local, rank 0 out of 2 processors
>
> mahmood@cluster:mpitest$ cat hosts
>
> cluster
>
> compute-0-0
>
>
>
> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun 
> -hostfile hosts -np 2 a.out
>
> 
>
> * hwloc 1.11.2 has encountered what looks like an error from the operating 
> system.
>
> *
>
> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
> 0xff00) without inclusion!
>
> * Error occurred in topology.c line 1048
>
> *
>
> * The following FAQ entry in the hwloc documentation may help:
>
> *   What should I do when hwloc reports "operating system" warnings?
>
> * Otherwise please report this error message to the hwloc user's mailing list,
>
> * along with the output+tarball generated by the hwloc-gather-topology script.
>
> 
>
> Hello world from processor cluster.hpc.org, rank 0 out of 2 processors
>
> Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
>
>
> how can I resolve that?
>
> Regards,
> Mahmood
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread r...@open-mpi.org
?? Doesn't that tell pbs to allocate 1 node with 2 slots on it? I don't see 
where you get 4 

Sent from my iPad

> On Jul 31, 2017, at 10:00 AM, Mahmood Naderan  wrote:
> 
> OK. The next question is how touse it with torque (PBS)? currently we write 
> this directive
> 
> Nodes=1:ppn=2
> 
> which means 4 threads. Then we omit -np and -hostfile in the mpirun command.
> 
>> On 31 Jul 2017 20:24, "Elken, Tom"  wrote:
>> Hi Mahmood,
>> 
>>  
>> 
>> With the -hostfile case, Open MPI is trying to helpfully run things faster 
>> by keeping both processes on one host.  Ways to avoid this…
>> 
>>  
>> 
>> On the mpirun command line add:
>> 
>>  
>> 
>> -pernode  (runs 1 process per node), oe
>> 
>> -npernode 1 ,   but these two has been deprecated in favor of the wonderful 
>> syntax:
>> 
>> --map-by ppr:1:node
>> 
>>  
>> 
>> Or you could change your hostfile to:
>> 
>> cluster slots=1
>> compute-0-0 slots=1
>>  
>> 
>>  
>> 
>> -Tom
>> 
>>  
>> 
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Mahmood 
>> Naderan
>> Sent: Monday, July 31, 2017 6:47 AM
>> To: Open MPI Users 
>> Subject: [OMPI users] -host vs -hostfile
>> 
>>  
>> 
>> Hi,
>> 
>> I have stuck at a problem which I don't remember that on previous versions. 
>> when I run a test program with -host, it works. I mean, the process spans to 
>> the hosts I specified. However, when I specify -hostfile, it doesn't work!!
>> 
>> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -host 
>> compute-0-0,cluster -np 2 a.out
>> 
>> * hwloc 1.11.2 has encountered what looks like an error from the operating 
>> system.
>> *
>> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
>> 0xff00) without inclusion!
>> * Error occurred in topology.c line 1048
>> *
>> * The following FAQ entry in the hwloc documentation may help:
>> *   What should I do when hwloc reports "operating system" warnings?
>> * Otherwise please report this error message to the hwloc user's mailing 
>> list,
>> * along with the output+tarball generated by the hwloc-gather-topology 
>> script.
>> 
>> Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
>> Hello world from processor compute-0-0.local, rank 0 out of 2 processors
>> mahmood@cluster:mpitest$ cat hosts
>> cluster
>> compute-0-0
>>  
>> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun 
>> -hostfile hosts -np 2 a.out  
>> 
>> * hwloc 1.11.2 has encountered what looks like an error from the operating 
>> system.
>> *
>> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
>> 0xff00) without inclusion!
>> * Error occurred in topology.c line 1048
>> *
>> * The following FAQ entry in the hwloc documentation may help:
>> *   What should I do when hwloc reports "operating system" warnings?
>> * Otherwise please report this error message to the hwloc user's mailing 
>> list,
>> * along with the output+tarball generated by the hwloc-gather-topology 
>> script.
>> 
>> Hello world from processor cluster.hpc.org, rank 0 out of 2 processors
>> Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
>> 
>> 
>> how can I resolve that?
>> Regards,
>> Mahmood
>> 
>> 
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread Mahmood Naderan
Excuse me, my fault.. I meant

nodes=2:ppn=2

is 4 threads.


Regards,
Mahmood



On Mon, Jul 31, 2017 at 8:49 PM, r...@open-mpi.org  wrote:

> ?? Doesn't that tell pbs to allocate 1 node with 2 slots on it? I don't
> see where you get 4
>
> Sent from my iPad
>
> On Jul 31, 2017, at 10:00 AM, Mahmood Naderan 
> wrote:
>
> OK. The next question is how touse it with torque (PBS)? currently we
> write this directive
>
> Nodes=1:ppn=2
>
> which means 4 threads. Then we omit -np and -hostfile in the mpirun
> command.
>
> On 31 Jul 2017 20:24, "Elken, Tom"  wrote:
>
>> Hi Mahmood,
>>
>>
>>
>> With the -hostfile case, Open MPI is trying to helpfully run things
>> faster by keeping both processes on one host.  Ways to avoid this…
>>
>>
>>
>> On the mpirun command line add:
>>
>>
>>
>> -pernode  (runs 1 process per node), oe
>>
>> -npernode 1 ,   but these two has been deprecated in favor of the
>> wonderful syntax:
>>
>> --map-by ppr:1:node
>>
>>
>>
>> Or you could change your hostfile to:
>>
>> cluster slots=1
>>
>> compute-0-0 slots=1
>>
>>
>>
>>
>>
>> -Tom
>>
>>
>>
>> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of 
>> *Mahmood
>> Naderan
>> *Sent:* Monday, July 31, 2017 6:47 AM
>> *To:* Open MPI Users 
>> *Subject:* [OMPI users] -host vs -hostfile
>>
>>
>>
>> Hi,
>>
>> I have stuck at a problem which I don't remember that on previous
>> versions. when I run a test program with -host, it works. I mean, the
>> process spans to the hosts I specified. However, when I specify -hostfile,
>> it doesn't work!!
>>
>> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -host 
>> compute-0-0,cluster -np 2 a.out
>>
>> 
>>
>> * hwloc 1.11.2 has encountered what looks like an error from the operating 
>> system.
>>
>> *
>>
>> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
>> 0xff00) without inclusion!
>>
>> * Error occurred in topology.c line 1048
>>
>> *
>>
>> * The following FAQ entry in the hwloc documentation may help:
>>
>> *   What should I do when hwloc reports "operating system" warnings?
>>
>> * Otherwise please report this error message to the hwloc user's mailing 
>> list,
>>
>> * along with the output+tarball generated by the hwloc-gather-topology 
>> script.
>>
>> 
>>
>> Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
>>
>> Hello world from processor compute-0-0.local, rank 0 out of 2 processors
>>
>> mahmood@cluster:mpitest$ cat hosts
>>
>> cluster
>>
>> compute-0-0
>>
>>
>>
>> mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun 
>> -hostfile hosts -np 2 a.out
>>
>> 
>>
>> * hwloc 1.11.2 has encountered what looks like an error from the operating 
>> system.
>>
>> *
>>
>> * Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
>> 0xff00) without inclusion!
>>
>> * Error occurred in topology.c line 1048
>>
>> *
>>
>> * The following FAQ entry in the hwloc documentation may help:
>>
>> *   What should I do when hwloc reports "operating system" warnings?
>>
>> * Otherwise please report this error message to the hwloc user's mailing 
>> list,
>>
>> * along with the output+tarball generated by the hwloc-gather-topology 
>> script.
>>
>> 
>>
>> Hello world from processor cluster.hpc.org, rank 0 out of 2 processors
>>
>> Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
>>
>>
>> how can I resolve that?
>>
>> Regards,
>> Mahmood
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread Elken, Tom
“4 threads”   In MPI, we refer to this as 4 ranks or 4 processes.

So what is your question?   Are you getting errors with PBS?

-Tom

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Mahmood 
Naderan
Sent: Monday, July 31, 2017 9:27 AM
To: Open MPI Users 
Subject: Re: [OMPI users] -host vs -hostfile

Excuse me, my fault.. I meant
nodes=2:ppn=2
is 4 threads.

Regards,
Mahmood


On Mon, Jul 31, 2017 at 8:49 PM, r...@open-mpi.org 
mailto:r...@open-mpi.org>> wrote:
?? Doesn't that tell pbs to allocate 1 node with 2 slots on it? I don't see 
where you get 4

Sent from my iPad

On Jul 31, 2017, at 10:00 AM, Mahmood Naderan 
mailto:mahmood...@gmail.com>> wrote:
OK. The next question is how touse it with torque (PBS)? currently we write 
this directive

Nodes=1:ppn=2

which means 4 threads. Then we omit -np and -hostfile in the mpirun command.

On 31 Jul 2017 20:24, "Elken, Tom" 
mailto:tom.el...@intel.com>> wrote:
Hi Mahmood,

With the -hostfile case, Open MPI is trying to helpfully run things faster by 
keeping both processes on one host.  Ways to avoid this…

On the mpirun command line add:

-pernode  (runs 1 process per node), oe
-npernode 1 ,   but these two has been deprecated in favor of the wonderful 
syntax:
--map-by ppr:1:node

Or you could change your hostfile to:

cluster slots=1

compute-0-0 slots=1


-Tom

From: users 
[mailto:users-boun...@lists.open-mpi.org]
 On Behalf Of Mahmood Naderan
Sent: Monday, July 31, 2017 6:47 AM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Subject: [OMPI users] -host vs -hostfile

Hi,

I have stuck at a problem which I don't remember that on previous versions. 
when I run a test program with -host, it works. I mean, the process spans to 
the hosts I specified. However, when I specify -hostfile, it doesn't work!!

mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -host 
compute-0-0,cluster -np 2 a.out



* hwloc 1.11.2 has encountered what looks like an error from the operating 
system.

*

* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
0xff00) without inclusion!

* Error occurred in topology.c line 1048

*

* The following FAQ entry in the hwloc documentation may help:

*   What should I do when hwloc reports "operating system" warnings?

* Otherwise please report this error message to the hwloc user's mailing list,

* along with the output+tarball generated by the hwloc-gather-topology script.



Hello world from processor cluster.hpc.org, rank 1 out 
of 2 processors

Hello world from processor compute-0-0.local, rank 0 out of 2 processors

mahmood@cluster:mpitest$ cat hosts

cluster

compute-0-0



mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun 
-hostfile hosts -np 2 a.out



* hwloc 1.11.2 has encountered what looks like an error from the operating 
system.

*

* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
0xff00) without inclusion!

* Error occurred in topology.c line 1048

*

* The following FAQ entry in the hwloc documentation may help:

*   What should I do when hwloc reports "operating system" warnings?

* Otherwise please report this error message to the hwloc user's mailing list,

* along with the output+tarball generated by the hwloc-gather-topology script.



Hello world from processor cluster.hpc.org, rank 0 out 
of 2 processors

Hello world from processor cluster.hpc.org, rank 1 out 
of 2 processors


how can I resolve that?
Regards,
Mahmood

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread Gus Correa

Hi

With nodes=2:ppn=2 Torque will provide two cores on two nodes each
for your job.
Open MPI will honor this, and work only on those nodes and cores.

Torque will put the list of node names (repeated twice each, since
you asked for two ppn/cores) in a "node file" that can be accessed
in your job through the environment variable $PBS_NODEFILE.
That is the default hostfile (or host list) used by Open MPI when you
start your MPI executable with mpirun.
You don't need to add any hostfile or host list to the mpirun
command line.
And in principle there is no reason why you would
want to do that either, as that would be error prone,
at least not for SPMD (single executable) programs.

You can easily print the contents of that file to your job stdout
with:

cat $PBS_NODEFILE

If you add another hostfile or host list to the mpirun command line,
and if that hostfile or host list conflicts with the
contents $PBS_NODEFILE (say, has a different set of nodes),
mpirun will fail.

In my experience, the only situation you would need
to modify this scheme under Torque, is when launching an MPMD (multiple 
executables) program, to produce an --app appfile, as a modified version 
of $PBS_NODEFILE.
However, that doesn't seem to be the case here, as the mpirun command 
line in the various emails has a single executable "a.out".


I hope this helps.
Gus Correa

On 07/31/2017 12:43 PM, Elken, Tom wrote:

“4 threads”   In MPI, we refer to this as 4 ranks or 4 processes.

So what is your question?   Are you getting errors with PBS?

-Tom

*From:*users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of 
*Mahmood Naderan

*Sent:* Monday, July 31, 2017 9:27 AM
*To:* Open MPI Users 
*Subject:* Re: [OMPI users] -host vs -hostfile

Excuse me, my fault.. I meant

nodes=2:ppn=2

is 4 threads.


Regards,
Mahmood

On Mon, Jul 31, 2017 at 8:49 PM, r...@open-mpi.org 
 mailto:r...@open-mpi.org>> 
wrote:


?? Doesn't that tell pbs to allocate 1 node with 2 slots on it? I
don't see where you get 4

Sent from my iPad


On Jul 31, 2017, at 10:00 AM, Mahmood Naderan mailto:mahmood...@gmail.com>> wrote:

OK. The next question is how touse it with torque (PBS)?
currently we write this directive

Nodes=1:ppn=2

which means 4 threads. Then we omit -np and -hostfile in the
mpirun command.

On 31 Jul 2017 20:24, "Elken, Tom" mailto:tom.el...@intel.com>> wrote:

Hi Mahmood,

With the -hostfile case, Open MPI is trying to helpfully run
things faster by keeping both processes on one host.  Ways
to avoid this…

On the mpirun command line add:

-pernode  (runs 1 process per node), oe

-npernode 1 ,   but these two has been deprecated in favor
of the wonderful syntax:

--map-by ppr:1:node

Or you could change your hostfile to:

|cluster slots=1|

|compute-0-0 slots=1|

-Tom

*From:*users [mailto:users-boun...@lists.open-mpi.org
] *On Behalf Of
*Mahmood Naderan
*Sent:* Monday, July 31, 2017 6:47 AM
*To:* Open MPI Users mailto:users@lists.open-mpi.org>>
*Subject:* [OMPI users] -host vs -hostfile

Hi,

I have stuck at a problem which I don't remember that on
previous versions. when I run a test program with |-host|,
it works. I mean, the process spans to the hosts I
specified. However, when I specify |-hostfile|, it doesn't
work!!

|mahmood@cluster:mpitest$
/share/apps/computer/openmpi-2.0.1/bin/mpirun -host
compute-0-0,cluster -np 2 a.out|


||

|* hwloc 1.11.2 has encountered what looks like an error from
the operating system.|

|*|

|* Package (P#1 cpuset 0x) intersects with NUMANode
(P#1 cpuset 0xff00) without inclusion!|

|* Error occurred in topology.c line 1048|

|*|

|* The following FAQ entry in the hwloc documentation may help:|

|*   What should I do when hwloc reports "operating system"
warnings?|

|* Otherwise please report this error message to the hwloc
user's mailing list,|

|* along with the output+tarball generated by the
hwloc-gather-topology script.|


||

|Hello world from processor cluster.hpc.org
, rank 1 out of 2 processors|

|Hello world from processor compute-0-0.local, rank 0 out of
2 processors|

|mahmood@cluster:mpitest$ cat hosts|

  

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread Mahmood Naderan
Well it is confusing!! As you can see, I added four nodes to the host file
(the same nodes are used by PBS). The --map-by ppr:1:node works well.
However, the PBS directive doesn't work

mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun
-hostfile hosts --map-by ppr:1:node a.out

* hwloc 1.11.2 has encountered what looks like an error from the operating
system.
*
* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset
0xff00) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing
list,
* along with the output+tarball generated by the hwloc-gather-topology
script.

Hello world from processor cluster.hpc.org, rank 0 out of 4 processors
Hello world from processor compute-0-0.local, rank 1 out of 4 processors
Hello world from processor compute-0-1.local, rank 2 out of 4 processors
Hello world from processor compute-0-2.local, rank 3 out of 4 processors
mahmood@cluster:mpitest$ cat mmt.sh
#!/bin/bash
#PBS -V
#PBS -q default
#PBS -j oe
#PBS -l  nodes=4:ppn=1
#PBS -N job1
#PBS -o .
cd $PBS_O_WORKDIR
/share/apps/computer/openmpi-2.0.1/bin/mpirun a.out
mahmood@cluster:mpitest$ qsub mmt.sh
6428.cluster.hpc.org
mahmood@cluster:mpitest$ cat job1.o6428
Hello world from processor compute-0-1.local, rank 0 out of 32 processors
Hello world from processor compute-0-1.local, rank 2 out of 32 processors
Hello world from processor compute-0-1.local, rank 3 out of 32 processors
Hello world from processor compute-0-1.local, rank 4 out of 32 processors
Hello world from processor compute-0-1.local, rank 5 out of 32 processors
Hello world from processor compute-0-1.local, rank 6 out of 32 processors
Hello world from processor compute-0-1.local, rank 8 out of 32 processors
Hello world from processor compute-0-1.local, rank 9 out of 32 processors
Hello world from processor compute-0-1.local, rank 12 out of 32 processors
Hello world from processor compute-0-1.local, rank 15 out of 32 processors
Hello world from processor compute-0-1.local, rank 16 out of 32 processors
Hello world from processor compute-0-1.local, rank 18 out of 32 processors
Hello world from processor compute-0-1.local, rank 19 out of 32 processors
Hello world from processor compute-0-1.local, rank 20 out of 32 processors
Hello world from processor compute-0-1.local, rank 21 out of 32 processors
Hello world from processor compute-0-1.local, rank 22 out of 32 processors
Hello world from processor compute-0-1.local, rank 24 out of 32 processors
Hello world from processor compute-0-1.local, rank 26 out of 32 processors
Hello world from processor compute-0-1.local, rank 27 out of 32 processors
Hello world from processor compute-0-1.local, rank 28 out of 32 processors
Hello world from processor compute-0-1.local, rank 29 out of 32 processors
Hello world from processor compute-0-1.local, rank 30 out of 32 processors
Hello world from processor compute-0-1.local, rank 31 out of 32 processors
Hello world from processor compute-0-1.local, rank 7 out of 32 processors
Hello world from processor compute-0-1.local, rank 10 out of 32 processors
Hello world from processor compute-0-1.local, rank 14 out of 32 processors
Hello world from processor compute-0-1.local, rank 1 out of 32 processors
Hello world from processor compute-0-1.local, rank 11 out of 32 processors
Hello world from processor compute-0-1.local, rank 13 out of 32 processors
Hello world from processor compute-0-1.local, rank 17 out of 32 processors
Hello world from processor compute-0-1.local, rank 23 out of 32 processors
Hello world from processor compute-0-1.local, rank 25 out of 32 processors



Any idea?

Regards,
Mahmood
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] -host vs -hostfile

2017-07-31 Thread Gus Correa

Maybe something is wrong with the Torque installation?
Or perhaps with the Open MPI + Torque integration?

1) Make sure your Open MPI was configured and compiled with the
Torque "tm" library of your Torque installation.
In other words:

configure --with-tm=/path/to/your/Torque/tm_library ...

2) Check if your $TORQUE/server_priv/nodes file has all the nodes
in your cluster.  If not, edit the file and add the missing nodes.
Then restart the Torque server (service pbs_server restart).

3) Run "pbsnodes" to see if all nodes are listed.

4) Run "hostname" with mpirun in a short Torque script:

#PBS -l nodes=4:ppn=1
...
mpirun hostname

The output should show all four nodes.

Good luck!
Gus Correa

On 07/31/2017 02:41 PM, Mahmood Naderan wrote:
Well it is confusing!! As you can see, I added four nodes to the host 
file (the same nodes are used by PBS). The --map-by ppr:1:node works 
well. However, the PBS directive doesn't work


mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun 
-hostfile hosts --map-by ppr:1:node a.out


* hwloc 1.11.2 has encountered what looks like an error from the 
operating system.

*
* Package (P#1 cpuset 0x) intersects with NUMANode (P#1 cpuset 
0xff00) without inclusion!

* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
*   What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing 
list,
* along with the output+tarball generated by the hwloc-gather-topology 
script.


Hello world from processor cluster.hpc.org , 
rank 0 out of 4 processors

Hello world from processor compute-0-0.local, rank 1 out of 4 processors
Hello world from processor compute-0-1.local, rank 2 out of 4 processors
Hello world from processor compute-0-2.local, rank 3 out of 4 processors
mahmood@cluster:mpitest$ cat mmt.sh
#!/bin/bash
#PBS -V
#PBS -q default
#PBS -j oe
#PBS -l  nodes=4:ppn=1
#PBS -N job1
#PBS -o .
cd $PBS_O_WORKDIR
/share/apps/computer/openmpi-2.0.1/bin/mpirun a.out
mahmood@cluster:mpitest$ qsub mmt.sh
6428.cluster.hpc.org 
mahmood@cluster:mpitest$ cat job1.o6428
Hello world from processor compute-0-1.local, rank 0 out of 32 processors
Hello world from processor compute-0-1.local, rank 2 out of 32 processors
Hello world from processor compute-0-1.local, rank 3 out of 32 processors
Hello world from processor compute-0-1.local, rank 4 out of 32 processors
Hello world from processor compute-0-1.local, rank 5 out of 32 processors
Hello world from processor compute-0-1.local, rank 6 out of 32 processors
Hello world from processor compute-0-1.local, rank 8 out of 32 processors
Hello world from processor compute-0-1.local, rank 9 out of 32 processors
Hello world from processor compute-0-1.local, rank 12 out of 32 processors
Hello world from processor compute-0-1.local, rank 15 out of 32 processors
Hello world from processor compute-0-1.local, rank 16 out of 32 processors
Hello world from processor compute-0-1.local, rank 18 out of 32 processors
Hello world from processor compute-0-1.local, rank 19 out of 32 processors
Hello world from processor compute-0-1.local, rank 20 out of 32 processors
Hello world from processor compute-0-1.local, rank 21 out of 32 processors
Hello world from processor compute-0-1.local, rank 22 out of 32 processors
Hello world from processor compute-0-1.local, rank 24 out of 32 processors
Hello world from processor compute-0-1.local, rank 26 out of 32 processors
Hello world from processor compute-0-1.local, rank 27 out of 32 processors
Hello world from processor compute-0-1.local, rank 28 out of 32 processors
Hello world from processor compute-0-1.local, rank 29 out of 32 processors
Hello world from processor compute-0-1.local, rank 30 out of 32 processors
Hello world from processor compute-0-1.local, rank 31 out of 32 processors
Hello world from processor compute-0-1.local, rank 7 out of 32 processors
Hello world from processor compute-0-1.local, rank 10 out of 32 processors
Hello world from processor compute-0-1.local, rank 14 out of 32 processors
Hello world from processor compute-0-1.local, rank 1 out of 32 processors
Hello world from processor compute-0-1.local, rank 11 out of 32 processors
Hello world from processor compute-0-1.local, rank 13 out of 32 processors
Hello world from processor compute-0-1.local, rank 17 out of 32 processors
Hello world from processor compute-0-1.local, rank 23 out of 32 processors
Hello world from processor compute-0-1.local, rank 25 out of 32 processors



Any idea?

Regards,
Mahmood




___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users



___
u

Re: [OMPI users] question about run-time of a small program

2017-07-31 Thread Gilles Gouaillardet

Siegmar,


a noticeable difference is hello_1 does *not* sleep, whereas 
hello_2_slave *does*


simply comment out the sleep(...) line, and performances will be identical


Cheers,

Gilles

On 7/31/2017 9:16 PM, Siegmar Gross wrote:

Hi,

I have two versions of a small program. In the first one the process 
with rank 0
calls the function "master()" and all other ranks call the function 
"slave()" and in the second one I have two programs: one for the 
master task and another
one for the slave task. The run-time for the second version is much 
bigger than
the one for the first version. Any ideas why the version with two 
separate

programs takes that long?

loki tmp 108 mpicc -o hello_1_mpi hello_1_mpi.c
loki tmp 109 mpicc -o hello_2_mpi hello_2_mpi.c
loki tmp 110 mpicc -o hello_2_slave_mpi hello_2_slave_mpi.c
loki tmp 111 /usr/bin/time -p mpiexec -np 3 hello_1_mpi
Process 0 of 3 running on loki
Process 1 of 3 running on loki
Process 2 of 3 running on loki
...
real 0.14
user 0.00
sys 0.00
loki tmp 112 /usr/bin/time -p mpiexec -np 1 hello_2_mpi : \
  -np 2 hello_2_slave_mpi
Process 0 of 3 running on loki
Process 1 of 3 running on loki
Process 2 of 3 running on loki
...
real 23.15
user 0.00
sys 0.00
loki tmp 113 ompi_info | grep "Open MPI repo revision"
  Open MPI repo revision: v3.0.0rc2
loki tmp 114


Thank you very much for any answer in advance.


Kind regards

Siegmar


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users