Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)

2015-09-04 Thread Diego Avesani
Dear all,
let's me check all your mail, because there are a lot of thing that I can
not understand.

As soon as possible I will reply, hopefully.



Diego


On 3 September 2015 at 17:23, Bennet Fauber  wrote:

> There is also the package Lmod, which provides similar functionality
> to environment modules.  It is maintained by TACC.
>
> https://www.tacc.utexas.edu/research-development/tacc-projects/lmod
>
> but I think the current source code is at
>
> https://github.com/TACC/Lmod
>
> -- bennet
>
>
>
> On Thu, Sep 3, 2015 at 11:13 AM, Jeff Squyres (jsquyres)
>  wrote:
> > On Sep 3, 2015, at 10:43 AM, Diego Avesani 
> wrote:
> >>
> >> Dear Jeff, Dear all,
> >> I normaly use "USE MPI"
> >>
> >> This is the answar fro intel HPC forum:
> >>
> >> If you are switching between intel and openmpi you must remember not to
> mix environment.  You might use modules to manage this.
> >
> > I think the source of the confusion here might well be an overload of
> the word "modules".
> >
> > I think the word "module" in the phrase "You might use modules to manage
> this" is referring to *environment modules*, not *Fortran modules*.  I.e.:
> http://modules.sourceforge.net/
> >
> > Where you can do stuff like this:
> >
> > -
> > # Use Open MPI
> > $ module load openmpi
> > $ mpicc my_program.c
> > $ mpirun -np 4 a.out
> >
> > # Use __some_other_MPI__
> > $ module load othermpi
> > $ mpicc my_program.c
> > $ mpirun -np 4 a.out
> > -
> >
> > Environment modules are typically used to set things like PATH,
> LD_LIBRARY_PATH, and MANPATH.
> >
> > I think the poster on the Intel HPC forum was probably referring to you
> using environment modules to switch your PATH / LD_LIBRARY_PATH / MANPATH
> between Open MPI and Intel MPI.
> >
> >> As the data types encodings differ, you must take care that all objects
> are built against the same headers.
> >
> > Here, the poster is essentially saying that if you want to use Open MPI,
> you have to compile and mpirun with Open MPI.  And if you want to use Open
> MPI, you have to (re)compile and mpirun with Intel MPI.
> >
> > In short: Open MPI and Intel MPI are not binary compatible, and their
> mpirun's are not compatible, either.
> >
> > (note that this is an Open MPI mailing list; we can't answer questions
> about Intel MPI here)
> >
> > My point with "use mpi" was that you should try replacing "include
> 'mpif.h'" with "use mpi" in your Fortran blocks.  Open MPI's "use mpi"
> implementation will do a lot of compile-time type checking that "include
> 'mpif.h'" will not.  Hence, it help determine if you're passing an
> incorrect parameter to an MPI subroutine, for example.
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27537.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27538.php
>


Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-09-04 Thread Lane, William
Our issues with OpenMPI 1.8.7 and Son-of-Gridengine turned out to be down to 
using the wrong Parallel Environment. Having a PE with control_slaves set to 
TRUE and start_proc_args and stop_proc_args set to NONE cleared up all our 
issues, at least for my MPI test code.

Qsort_args was left set to NONE, which directly contradicts the FAQ for running 
OpenMPI through Son-of-Gridengine so maybe the OpenMPI FAQ WRT SOGE should be 
revised. Qsort_args is an API that involves writing your own function in a 
dynamically shared object that determines what nodes should be used to schedule 
a job.

-Bill L.


From: users [users-boun...@open-mpi.org] on behalf of Gilles Gouaillardet 
[gilles.gouaillar...@gmail.com]
Sent: Wednesday, August 12, 2015 7:40 AM
To: Open MPI Users
Subject: Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 
1.8.7

basically, without --hetero-nodes, ompi assumes all nodes have the same 
topology (fast startup)
with --hetero-nodes, ompi does not assume anything and request node topology 
(slower startup)

I am nor sure if this is still 100% true on all versions.
iirc, at least on master, a hwloc signature is checked and ompi transparently 
fall back to --heyero-nodes if needed

bottom line, on a heterogeneous cluster, it is required or safer to use the 
--hetero-nodes option


Cheers,

Gilles

On Wednesday, August 12, 2015, Dave Love 
mailto:d.l...@liverpool.ac.uk>> wrote:
"Lane, William" > writes:

> I can successfully run my OpenMPI 1.8.7 jobs outside of Son-of-Gridengine but 
> not via qrsh. We're
> using CentOS 6.3 and a heterogeneous cluster of hyperthreaded and 
> non-hyperthreaded blades
> and x3550 chassis. OpenMPI 1.8.7 has been built w/the debug switch as well.

I think you want to explain exactly why you need this world of pain.  It
seems unlikely that MPI programs will run efficiently in it.  Our Intel
nodes mostly have hyperthreading on in BIOS -- or what passes for BIOS
on them -- but disabled at startup, and we only run MPI across identical
nodes in the heterogeneous system.

> Here's my latest errors:
> qrsh -V -now yes -pe mpi 209 mpirun -np 209 -display-devel-map --prefix 
> /hpc/apps/mpi/openmpi/1.8.7/ --mca btl ^sm --hetero-nodes --bind-to core 
> /hpc/home/lanew/mpi/openmpi/ProcessColors3

[What does --hetero-nodes do?  It's undocumented as far as I can tell.]

> error: executing task of job 211298 failed: execution daemon on host 
> "csclprd3-0-4" didn't accept task
> error: executing task of job 211298 failed: execution daemon on host 
> "csclprd3-4-1" didn't accept task

So you need to find out why that was (probably lack of slots on the exec
host, which might be explained in the execd messages).

> [...]

> NOTE: the hosts that "didn't accept task" were different in two different 
> runs but the errors were the same.
>
> Here's the definition of the mpi Parallel Environment on our 
> Son-of-Gridengine cluster:
>
> pe_namempi
> slots  
> user_lists NONE
> xuser_listsNONE
> start_proc_args/opt/sge/mpi/startmpi.sh $pe_hostfile
> stop_proc_args /opt/sge/mpi/stopmpi.sh

Why are those two not NONE?

> allocation_rule$fill_up

As I said, that doesn't seem wise (unless you use -l exclusive).

> control_slaves FALSE
> job_is_first_task  TRUE
> urgency_slots  min
> accounting_summary TRUE
> qsort_args NONE
>
> Qsort_args is set to NONE, but it's supposed to be set to TRUE right?

No see sge_pe(5).  (I think the text I supplied for the FAQ is accurate,
but reuti might confirm if he's reading this.)

> -Bill L.
>
> If I can run my OpenMPI 1.8.7 jobs outside of Son-of-Gridengine w/no issues 
> it has to be Son-of-Gridengine that's
> the issue right?

I don't see any evidence of an SGE bug, if that's what you mean, but
clearly you have a problem if execds won't accept the jobs, and this
isn't the place to discuss it.  I asked about SGE core binding, and it's
presumably also relevant how slots are defined on the compute nodes, but
I'd just say "Don't do that" without a pressing reason.
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/08/27436.php
IMPORTANT WARNING: This message is intended for the use of the person or entity 
to which it is addressed and may contain information that is privileged and 
confidential, the disclosure of which is governed by applicable law. If the 
reader of this message is not the intended recipient, or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this information is 
strictly prohibited. Thank you for your cooperation.