Re: [OMPI users] Can't connect using MPI Ports

2017-11-05 Thread Florian Lindner
Am 04.11.2017 um 00:05 schrieb r...@open-mpi.org:
> Yeah, there isn’t any way that is going to work in the 2.x series. I’m not 
> sure it was ever fixed, but you might try the latest 3.0, the 3.1rc, and even 
> master.
> 
> The only methods that are known to work are:
> 
> * connecting processes within the same mpirun - e.g., using comm_spawn

That is not an option for our application.

> * connecting processes across different mpiruns, with the ompi-server daemon 
> as the rendezvous point
> 
> The old command line method (i.e., what you are trying to use) hasn’t been 
> much on the radar. I don’t know if someone else has picked it up or not...

What do you mean with "the old command line method".

Isn't the ompi-server just another means of exchanging port names, i.e. the 
same I do using files?

In my understanding, using Publish_name and Lookup_name or exchanging the 
information using files (or command line or stdin) shouldn't have any
impact on the connection (Connect / Accept) itself.

Best,
Florian


> Ralph
> 
>> On Nov 3, 2017, at 11:23 AM, Florian Lindner  wrote:
>>
>>
>> Am 03.11.2017 um 16:18 schrieb r...@open-mpi.org:
>>> What version of OMPI are you using?
>>
>> 2.1.1 @ Arch Linux.
>>
>> Best,
>> Florian
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
> 
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Can't connect using MPI Ports

2017-11-05 Thread r...@open-mpi.org

> On Nov 5, 2017, at 6:48 AM, Florian Lindner  wrote:
> 
> Am 04.11.2017 um 00:05 schrieb r...@open-mpi.org :
>> Yeah, there isn’t any way that is going to work in the 2.x series. I’m not 
>> sure it was ever fixed, but you might try the latest 3.0, the 3.1rc, and 
>> even master.
>> 
>> The only methods that are known to work are:
>> 
>> * connecting processes within the same mpirun - e.g., using comm_spawn
> 
> That is not an option for our application.
> 
>> * connecting processes across different mpiruns, with the ompi-server daemon 
>> as the rendezvous point
>> 
>> The old command line method (i.e., what you are trying to use) hasn’t been 
>> much on the radar. I don’t know if someone else has picked it up or not...
> 
> What do you mean with "the old command line method”.
> 
> Isn't the ompi-server just another means of exchanging port names, i.e. the 
> same I do using files?

No, it isn’t - there is a handshake that ompi-server facilitates.

> 
> In my understanding, using Publish_name and Lookup_name or exchanging the 
> information using files (or command line or stdin) shouldn't have any
> impact on the connection (Connect / Accept) itself.

Depends on the implementation underneath connect/accept.

The initial MPI standard authors had fixed in their minds that the 
connect/accept handshake would take place over a TCP socket, and so no 
intermediate rendezvous broker was involved. That isn’t how we’ve chosen to 
implement it this time around, and so you do need the intermediary. If/when 
some developer wants to add another method, they are welcome to do so - but the 
general opinion was that the broker requirement was fine.

> 
> Best,
> Florian
> 
> 
>> Ralph
>> 
>>> On Nov 3, 2017, at 11:23 AM, Florian Lindner  wrote:
>>> 
>>> 
>>> Am 03.11.2017 um 16:18 schrieb r...@open-mpi.org:
 What version of OMPI are you using?
>>> 
>>> 2.1.1 @ Arch Linux.
>>> 
>>> Best,
>>> Florian
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>> 
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://lists.open-mpi.org/mailman/listinfo/users 
> 
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-11-05 Thread Konstantinos Konstantinidis
Hi George,

First, let me note that the cost of q^(k-1)]*(q-1) communicators was fine
for the values of parameters q,k I am working with. Also, the whole point
of speeding up the shuffling phase is trying to reduce this number even
more (compared to already known implementations) which is a major concern
of my project. But thanks for pointing that out. Btw, do you know what is
the maximum such number in MPI?

Now to the main part of the question, let me clarify that I have 1 process
per machine. I don't know if this is important here but my way of thinking
is that we have a big text file and each process will have to work on some
chunks of it (like chapters of a book). But each process resides in an
machine with some RAM which is able to handle a specific amount of work so
if you generate many processes per machine you must have fewer book
chapters per process than before. Thus, I wanted to avoid thinking in the
process-level rather than machine-level with the RAM limitations.

Now to the actual shuffling, here is what I am currently doing (Option 1):

Let's denote the data that slave s has to send to the slaves in group G as
D(s,G).

*for each slave s in 1,2,...,K{*

*for each group G that s participates into{*

*if (my rank is s){*
*MPI_Bcast(send data D(s,G))*
*}else if(my rank is in group G)*
*MPI_Bcast(get data D(s,G))*
*}else{*
*   Do nothing*
*}*

*}*

*MPI::COMM_WORLD.Barrier();*

*}*

What I suggested before to speedup things (Option 2) is:

*for each set {G(1),G(2),...,G(q-1)} of q-1 disjoint groups{ *

*for each slave s in G(1)*
*if (my rank is s){*
*MPI_Bcast(send data D(s,G(1)))*
*}else if(**my rank is in** group G(1))*
*MPI_Bcast(get data D(s,G(1)))*
*}else{*
*   Do nothing*
*}*
*}*

*for each slave s in G(2)*
*if (my rank is s){*
*MPI_Bcast(send data D(s,G(2)))*
*}else if(**my rank is in** G(2))*
*MPI_Bcast(get data D(s,G(2)))*
*}else{*
*   Do nothing*
*}*
*}*

*...*

*for each slave s in G(q-1)*
*if (my rank is s){*
*MPI_Bcast(send data D(s,G(q-1)))*
*}else if(**my rank is in** G(q-1))*
*MPI_Bcast(get data D(s,G(q-1)))*
*}else{*
*   Do nothing*
*}*
*}*

*MPI::COMM_WORLD.Barrier();*

*}*

My hope was that I could implement Option 2 (in some way without copying
and pasting the same code q-1 times every time I change q) and that this
could bring a speedup of q-1 compared to Option 1 by having these groups
communicate in parallel. Right, now I am trying to find a way to identify
these sets of groups based on my implementation, which involves some
abstract algebra but for now let's assume that I can find them in an
efficient manner.

Let me emphasize that each broadcast sends different actual data. There are
no two broadcasts that send the same D(s,G).

Finally, let's go to MPI_Allgather(): I am really confused since I have
never used this call but I have this image in my mind:


​
I am not sure what you meant but now I am thinking of this (let commG be
the intra-communicator of group G):

*for each possible group G{*

*if (my rank is in G){*
*commG.MPI_AllGather(**send data D(rank,G)**)*
*}**else{*
*Do nothing*
*}*

*MPI::COMM_WORLD.Barrier();*

*}*

I am not sure whether this makes sense since I am confused about the
correspodence of the data transmitted with Allgather() compared to the
notation D(s,G) I am currently using.

Thanks.


On Tue, Oct 31, 2017 at 11:11 PM, George Bosilca 
wrote:

> It really depends what are you trying to achieve. If the question is
> rhetorical: "can I write a code that does in parallel broadcasts on
> independent groups of processes ?" then the answer is yes, this is
> certainly possible. If however you add a hint of practicality in your
> question "can I write an efficient parallel broadcast between independent
> groups of processes?" then I'm afraid the answer will be a negative one.
>
> Let's not look at how you can write the multiple bcast code as the answer
> in the stackoverflow is correct, but instead look at what resources these
> collective operations are using. In general you can assume that nodes are
> connected by a network, able to move data at a rate B in both directions
> (full duplex). Assuming the implementation of the bcast algorithm is not
> entirely moronic, the bcast can saturate the network with a single process
> per node. Now, if you have multiple processes per node (P) then either you
> schedule them sequentially (so that each one has the full bandwidth B) or
> you let them progress in parallel in which case each participating process
> can claim a lower bandwidth B/P (as it is shared between all processes on
> the nore).
>
> So even if you are able to expose enough parallelism, physical resources
> will impose t

Re: [OMPI users] Vague error message while executing MPI-Fortran program

2017-11-05 Thread Michael Mauersberger
Hi,

thank you for your help. Unfortunately I don't have access to the source oft
he calling program. Maybe there is a subtle problem with some MPI commands.
But I have solved the problem in another way.

There is a module in the basic library using PRIVATE variables to call
predefined procedures according to several cases of calculation. That means
the private variables are changed so that they can adapt a general routine
to special calculations.

So I deleted the private variables and put them into procedure calling as
arguments. Now there is no problem with MPI calling any more.

Maybe you have an idea why it didn't work with those private variables? But
- well, if not there would not be a problem any more (although I don't know
why). ;)

Best regards

Michael



__
Dipl.-Ing. Michael Mauersberger
michael.mauersber...@tu-dresden.de
Tel +49 351 463-38099 | Fax +49 351 463-37263
Technische Universität Dresden
Institut für Luft- und Raumfahrttechnik / Institute of Aerospace Engineering
Professur für Luftfahrzeugtechnik / Chair of Aircraft Engineering
Prof. Dr. K. Wolf | 01062 Dresden | tu-dresden.de/ilr/lft

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@lists.open-mpi.org] Im Auftrag von Reuti
Gesendet: Dienstag, 24. Oktober 2017 13:09
An: Open MPI Users 
Betreff: Re: [OMPI users] Vague error message while executing MPI-Fortran
program

Hi,

> Am 24.10.2017 um 09:33 schrieb Michael Mauersberger
:
> 
>  
>  
> Dear all,
>  
> When compiling and running a Fortran program on Linux (OpenSUSE Leap 42.3)
I get an undefinable error message stating, that some "Boundary Run-Time
Check Failure" ocurred for variable "ARGBLOCK_0.0.2". But this variable I
don't know or use in my code and the compiler is tracing me back to the line
of a "CONTAINS" statement in a module.

A `strings * | grep ARGBLOCK` in
/opt/intel/compilers_and_libraries_2017.4.196/linux/bin/intel64 reveals:

ARGBLOCK_%d
ARGBLOCK_REC_%d

So it looks like the output is generated on-the-fly and doesn't point to any
existing variable. But to which argument of which routine is still unclear.
Does the Intel Compile have the feature to output a cross-refernce of all
used variables? Maybe it's listed there.

-- Reuti


> I am using the Intel Fortran Compiler from Intel Composer XE 2013 with the
following Options:
> ifort -fPIC -g -traceback -O2 -check all,noarg_temp_created -warn all
>  
> Furthermore, the program uses Intel MKL with the functions DGETRF, 
> DGETRS, DSYGV, DGEMM, DGGEV and the C-Library NLopt.
>  
> The complete error message looks like:
>  
> Boundary Run-Time Check Failure for variable 'ARGBLOCK_0.0.2'
>  
> forrtl: error (76): Abort trap signal
> Image  PCRoutineLineSource

> libc.so.6  7F2BF06CC8D7  Unknown   Unknown
Unknown
> libc.so.6  7F2BF06CDCAA  Unknown   Unknown
Unknown
> geops  006A863F  Unknown   Unknown
Unknown
> libmodell.so   7F2BF119E54D  strukturtest_mod_ 223
strukturtest_mod.f90
> libmodell.so   7F2BF1184056  modell_start_ 169
modell_start.f90
> geops  0045D1A3  Unknown   Unknown
Unknown
> geops  0042C2C6  Unknown   Unknown
Unknown
> geops  0040A14C  Unknown   Unknown
Unknown
> libc.so.6  7F2BF06B86E5  Unknown   Unknown
Unknown
> geops  0040A049  Unknown   Unknown
Unknown
>

===
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 134
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ==
> = YOUR APPLICATION TERMINATED WITH THE EXIT STRING: 
> Aborted (signal 6) This typically refers to a problem with your 
> application.
> Please see the FAQ page for debugging suggestions
>  
>  
> The program has the following structure:
> - basic functions linked into static library (*.a), containing only 
> modules --> using MKL routines
> - main program linked into a dynamic library, containing 1 bare 
> subroutine, modules else
> - calling program (executed with mpiexec), calls mentioned subroutine 
> in main program
>  
> Without the calling program (in Open MPI) the subroutine runs without
problems. But when invoking it with the MPI program I get the error message
above.
>  
> So maybe some of you encountered a similar problem and is able to help me.
I would be really grateful.
>  
> Thanks,
>  
> Michael
>  
> ___
>  
> Dipl.-Ing. Michael Mauersberger
> 
> Tel. +49 351 463 38099 | Fax +49 351 463 37263 Marschnerstraße 30, 
> 01307 Dresden Professur für Luftfahrzeugtechnik | Pr