[OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente
Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in  
Xcode", but it's only for MPICC. I am using MPIF90, so I did the same,  
but changing MPICC for MPIF90, and also the path, but it did not work.


Building target “fortran” of project “fortran” with configuration  
“Debug”



Checking Dependencies
Invalid value 'MPIF90' for GCC_VERSION


The file "MPIF90.cpcompspec" looks like this:

  1 /**
  2 Xcode Coompiler Specification for MPIF90
  3
  4 */
  5
  6 {   Type = Compiler;
  7 Identifier = com.apple.compilers.mpif90;
  8 BasedOn = com.apple.compilers.gcc.4_0;
  9 Name = "MPIF90";
 10 Version = "Default";
 11 Description = "MPI GNU C/C++ Compiler 4.0";
 12 ExecPath = "/usr/local/bin/mpif90";  // This gets  
converted to the g++ variant automatically

 13 PrecompStyle = pch;
 14 }

and is located in "/Developer/Library/Xcode/Plug-ins"

and when I do mpif90 -v on terminal it works well:

Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure -- 
prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/ 
tmp/gfortran-20090321/gfortran_libs --enable-bootstrap

Thread model: posix
gcc version 4.4.0 20090321 (experimental) [trunk revision 144983] (GCC)


Any idea??

Thanks.

Vincent

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente
Yes, I already have gfortran compiler on /usr/local/bin, the same path  
as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin  
and on  /Developer/usr/bin says it:


"Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional."


That should be the problem, I will have to change the path to use the  
gfortran I have installed.

How could I do it? (Sorry, I am beginner)

Thanks.


El 04/05/2009, a las 17:38, Warner Yuen escribió:

Have you installed a Fortran compiler? Mac OS X's developer tools do  
not come with a Fortran compiler, so you'll need to install one if  
you haven't already done so. I routinely use the Intel IFORT  
compilers with success. However, I hear many good things about the  
gfortran compilers on Mac OS X, you can't beat the price of gfortran!



Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wy...@apple.com
Tel: 408.718.2859




On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:


Send users mailing list submissions to
us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org

You can reach the person managing the list at
users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

 1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)


--

Message: 1
Date: Mon, 4 May 2009 16:12:44 +0200
From: Vicente 
Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
To: us...@open-mpi.org
Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
Content-Type: text/plain; charset="windows-1252"; Format="flowed";
DelSp="yes"

Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
Xcode", but it's only for MPICC. I am using MPIF90, so I did the  
same,
but changing MPICC for MPIF90, and also the path, but it did not  
work.


Building target ?fortran? of project ?fortran? with configuration
?Debug?


Checking Dependencies
Invalid value 'MPIF90' for GCC_VERSION


The file "MPIF90.cpcompspec" looks like this:

 1 /**
 2 Xcode Coompiler Specification for MPIF90
 3
 4 */
 5
 6 {   Type = Compiler;
 7 Identifier = com.apple.compilers.mpif90;
 8 BasedOn = com.apple.compilers.gcc.4_0;
 9 Name = "MPIF90";
10 Version = "Default";
11 Description = "MPI GNU C/C++ Compiler 4.0";
12 ExecPath = "/usr/local/bin/mpif90";  // This gets
converted to the g++ variant automatically
13 PrecompStyle = pch;
14 }

and is located in "/Developer/Library/Xcode/Plug-ins"

and when I do mpif90 -v on terminal it works well:

Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/
tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
Thread model: posix
gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]  
(GCC)



Any idea??

Thanks.

Vincent
-- next part --
HTML attachment scrubbed and removed

--

Message: 2
Date: Mon, 4 May 2009 08:28:26 -0600
From: Ralph Castain 
Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
To: Open MPI Users 
Message-ID:
<71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Unfortunately, I didn't write any of that code - I was just fixing  
the
mapper so it would properly map the procs. From what I can tell,  
the proper

things are happening there.

I'll have to dig into the code that specifically deals with parsing  
the
results to bind the processes. Afraid that will take awhile longer  
- pretty

dark in that hole.


On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot  
 wrote:



Hi,

So, there are no more crashes with my "crazy" mpirun command. But  
the

paffinity feature seems to be broken. Indeed I am not able to pin my
processes.

Simple test with a program using your plpa library :

r011n006% cat hostf
r011n006 slots=4

r011n006% cat rankf
rank 0=r011n006 slot=0   > bind to CPU 0 , exact ?

r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf -- 
rankfile

rankf --wdir /tmp -n 1 a.out

PLPA Number of processors online: 4
PLPA Number of processor sockets: 2
PLPA Socket 0 (ID 0): 2 cores
PLPA Socket 1 (ID 3): 2 cores


Ctrl+Z
r011n006%bg

r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot
R+   g

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente
I can use openmpi from terminal, but I am having problems with gdb, so  
I wanted to know if there was posible to use openmpi with Xcode.


However, for mac users, which is the best way to compile and debug an  
mpi program??.


Thanks.

Vincent


El 04/05/2009, a las 17:42, Jeff Squyres escribió:

FWIW, I don't use Xcode, but I use the precompiled gcc/gfortran from  
here with good success:


   http://hpc.sourceforge.net/



On May 4, 2009, at 11:38 AM, Warner Yuen wrote:


Have you installed a Fortran compiler? Mac OS X's developer tools do
not come with a Fortran compiler, so you'll need to install one if  
you

haven't already done so. I routinely use the Intel IFORT compilers
with success. However, I hear many good things about the gfortran
compilers on Mac OS X, you can't beat the price of gfortran!


Warner Yuen
Scientific Computing
Consulting Engineer
Apple, Inc.
email: wy...@apple.com
Tel: 408.718.2859




On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:

> Send users mailing list submissions to
>   us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>   http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
>   users-requ...@open-mpi.org
>
> You can reach the person managing the list at
>   users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>   1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
>   2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
>
>
>  
------

>
> Message: 1
> Date: Mon, 4 May 2009 16:12:44 +0200
> From: Vicente 
> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
> To: us...@open-mpi.org
> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed";
>   DelSp="yes"
>
> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
> Xcode", but it's only for MPICC. I am using MPIF90, so I did the  
same,
> but changing MPICC for MPIF90, and also the path, but it did not  
work.

>
> Building target ?fortran? of project ?fortran? with configuration
> ?Debug?
>
>
> Checking Dependencies
> Invalid value 'MPIF90' for GCC_VERSION
>
>
> The file "MPIF90.cpcompspec" looks like this:
>
>   1 /**
>   2 Xcode Coompiler Specification for MPIF90
>   3
>   4 */
>   5
>   6 {   Type = Compiler;
>   7 Identifier = com.apple.compilers.mpif90;
>   8 BasedOn = com.apple.compilers.gcc.4_0;
>   9 Name = "MPIF90";
>  10 Version = "Default";
>  11 Description = "MPI GNU C/C++ Compiler 4.0";
>  12 ExecPath = "/usr/local/bin/mpif90";  // This gets
> converted to the g++ variant automatically
>  13 PrecompStyle = pch;
>  14 }
>
> and is located in "/Developer/Library/Xcode/Plug-ins"
>
> and when I do mpif90 -v on terminal it works well:
>
> Using built-in specs.
> Target: i386-apple-darwin8.10.1
> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure --
> prefix=/usr/local/gfortran --enable-languages=c,fortran --with- 
gmp=/

> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap
> Thread model: posix
> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983]
> (GCC)
>
>
> Any idea??
>
> Thanks.
>
> Vincent
> -- next part --
> HTML attachment scrubbed and removed
>
> --
>
> Message: 2
> Date: Mon, 4 May 2009 08:28:26 -0600
> From: Ralph Castain 
> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??
> To: Open MPI Users 
> Message-ID:
>   <71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Unfortunately, I didn't write any of that code - I was just  
fixing the
> mapper so it would properly map the procs. From what I can tell,  
the

> proper
> things are happening there.
>
> I'll have to dig into the code that specifically deals with parsing
> the
> results to bind the processes. Afraid that will take awhile  
longer -

> pretty
> dark in that hole.
>
>
> On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot
>  wrote:
>
>> Hi,
>>
>> So, there are no more crashes with my "crazy" mpirun command.  
But the
>> paffinity feature seems to be broken. Indeed I am not able to  
pin my

>>

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente Puig
If I can not make it work with Xcode,  which one could I use?, which one do
you use to compile and debug OpenMPI?.
Thanks

Vincent


2009/5/4 Jeff Squyres 

> Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard
> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has
> non-functional mpif77 and mpif90 wrapper compilers.
>
> So the Open MPI that you installed manually will use your Fortran
> compilers, and therefore will have functional mpif77 and mpif90 wrapper
> compilers.  Hence, you probably need to be sure to use the "right" wrapper
> compilers.  It looks like you specified the full path specified to ExecPath,
> so I'm not sure why Xcode wouldn't work with that (like I mentioned, I
> unfortunately don't use Xcode myself, so I don't know why that wouldn't
> work).
>
>
>
>
> On May 4, 2009, at 11:53 AM, Vicente wrote:
>
>  Yes, I already have gfortran compiler on /usr/local/bin, the same path
>> as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin
>> and on  /Developer/usr/bin says it:
>>
>> "Unfortunately, this installation of Open MPI was not compiled with
>> Fortran 90 support.  As such, the mpif90 compiler is non-functional."
>>
>>
>> That should be the problem, I will have to change the path to use the
>> gfortran I have installed.
>> How could I do it? (Sorry, I am beginner)
>>
>> Thanks.
>>
>>
>> El 04/05/2009, a las 17:38, Warner Yuen escribió:
>>
>> > Have you installed a Fortran compiler? Mac OS X's developer tools do
>> > not come with a Fortran compiler, so you'll need to install one if
>> > you haven't already done so. I routinely use the Intel IFORT
>> > compilers with success. However, I hear many good things about the
>> > gfortran compilers on Mac OS X, you can't beat the price of gfortran!
>> >
>> >
>> > Warner Yuen
>> > Scientific Computing
>> > Consulting Engineer
>> > Apple, Inc.
>> > email: wy...@apple.com
>> > Tel: 408.718.2859
>> >
>> >
>> >
>> >
>> > On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote:
>> >
>> >> Send users mailing list submissions to
>> >>  us...@open-mpi.org
>> >>
>> >> To subscribe or unsubscribe via the World Wide Web, visit
>> >>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >> or, via email, send a message with subject or body 'help' to
>> >>  users-requ...@open-mpi.org
>> >>
>> >> You can reach the person managing the list at
>> >>  users-ow...@open-mpi.org
>> >>
>> >> When replying, please edit your Subject line so it is more specific
>> >> than "Re: Contents of users digest..."
>> >>
>> >>
>> >> Today's Topics:
>> >>
>> >>  1. How do I compile OpenMPI in Xcode 3.1 (Vicente)
>> >>  2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain)
>> >>
>> >>
>> >> --
>> >>
>> >> Message: 1
>> >> Date: Mon, 4 May 2009 16:12:44 +0200
>> >> From: Vicente 
>> >> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> >> To: us...@open-mpi.org
>> >> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com>
>> >> Content-Type: text/plain; charset="windows-1252"; Format="flowed";
>> >>  DelSp="yes"
>> >>
>> >> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in
>> >> Xcode", but it's only for MPICC. I am using MPIF90, so I did the
>> >> same,
>> >> but changing MPICC for MPIF90, and also the path, but it did not
>> >> work.
>> >>
>> >> Building target ?fortran? of project ?fortran? with configuration
>> >> ?Debug?
>> >>
>> >>
>> >> Checking Dependencies
>> >> Invalid value 'MPIF90' for GCC_VERSION
>> >>
>> >>
>> >> The file "MPIF90.cpcompspec" looks like this:
>> >>
>> >>  1 /**
>> >>  2 Xcode Coompiler Specification for MPIF90
>> >>  3
>> >>  4 */
>> >>  5
>> >>  6 {   Type = Compiler;
>> >>  7 Identifier = com.apple.compilers.mpif90;
>&

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente Puig
I can run openmpi perfectly with command line, but I wanted a graphic
interface for debugging because I was having problems.
Thanks anyway.

Vincent

2009/5/4 Warner Yuen 

> Admittedly, I don't use Xcode to build Open MPI either.
>
> You can just compile Open MPI from the command line and install everything
> in /usr/local/. Make sure that gfortran is set in your path and you should
> just be able to do a './configure --prefix=/usr/local'
>
> After the installation, just make sure that your path is set correctly when
> you go to use the newly installed Open MPI. If you don't set your path, it
> will always default to using the version of OpenMPI that ships with Leopard.
>
>
> Warner Yuen
> Scientific Computing
> Consulting Engineer
> Apple, Inc.
> email: wy...@apple.com
> Tel: 408.718.2859
>
>
>
>
> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote:
>
>  Send users mailing list submissions to
>>us...@open-mpi.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>>users-requ...@open-mpi.org
>>
>> You can reach the person managing the list at
>>users-ow...@open-mpi.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>>  1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig)
>>
>>
>> --
>>
>> Message: 1
>> Date: Mon, 4 May 2009 18:13:45 +0200
>> From: Vicente Puig 
>> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> To: Open MPI Users 
>> Message-ID:
>><3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> If I can not make it work with Xcode,  which one could I use?, which one
>> do
>> you use to compile and debug OpenMPI?.
>> Thanks
>>
>> Vincent
>>
>>
>> 2009/5/4 Jeff Squyres 
>>
>>  Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard
>>> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has
>>> non-functional mpif77 and mpif90 wrapper compilers.
>>>
>>> So the Open MPI that you installed manually will use your Fortran
>>> compilers, and therefore will have functional mpif77 and mpif90 wrapper
>>> compilers.  Hence, you probably need to be sure to use the "right"
>>> wrapper
>>> compilers.  It looks like you specified the full path specified to
>>> ExecPath,
>>> so I'm not sure why Xcode wouldn't work with that (like I mentioned, I
>>> unfortunately don't use Xcode myself, so I don't know why that wouldn't
>>> work).
>>>
>>>
>>>
>>>
>>> On May 4, 2009, at 11:53 AM, Vicente wrote:
>>>
>>> Yes, I already have gfortran compiler on /usr/local/bin, the same path
>>>
>>>> as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin
>>>> and on  /Developer/usr/bin says it:
>>>>
>>>> "Unfortunately, this installation of Open MPI was not compiled with
>>>> Fortran 90 support.  As such, the mpif90 compiler is non-functional."
>>>>
>>>>
>>>> That should be the problem, I will have to change the path to use the
>>>> gfortran I have installed.
>>>> How could I do it? (Sorry, I am beginner)
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> El 04/05/2009, a las 17:38, Warner Yuen escribi?:
>>>>
>>>>  Have you installed a Fortran compiler? Mac OS X's developer tools do
>>>>> not come with a Fortran compiler, so you'll need to install one if
>>>>> you haven't already done so. I routinely use the Intel IFORT
>>>>> compilers with success. However, I hear many good things about the
>>>>> gfortran compilers on Mac OS X, you can't beat the price of gfortran!
>>>>>
>>>>>
>>>>> Warner Yuen
>>>>> Scientific Computing
>>>>> Consulting Engineer
>>>>> Apple, Inc.
>>>>> email: wy...@apple.com
>>>>> Tel: 408.718.2859
>>>>>
>>>>>
>>>>&

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente Puig
Maybe I had to open a new thread, but if you have any idea why I receive it
when I use gdb for debugging an openmpi program:
warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug
information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".


warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no
debug information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".


warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no
debug information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".


warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug
information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c".


warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o"
- no debug information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c".


warning: Could not find object file
"/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug
information available for
"../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c".
...



There is no 'admin' so I don't know why it happen. It works well with a C
program.

Any idea??.

Thanks.


Vincent





2009/5/4 Vicente Puig 

> I can run openmpi perfectly with command line, but I wanted a graphic
> interface for debugging because I was having problems.
> Thanks anyway.
>
> Vincent
>
> 2009/5/4 Warner Yuen 
>
> Admittedly, I don't use Xcode to build Open MPI either.
>>
>> You can just compile Open MPI from the command line and install everything
>> in /usr/local/. Make sure that gfortran is set in your path and you should
>> just be able to do a './configure --prefix=/usr/local'
>>
>> After the installation, just make sure that your path is set correctly
>> when you go to use the newly installed Open MPI. If you don't set your path,
>> it will always default to using the version of OpenMPI that ships with
>> Leopard.
>>
>>
>> Warner Yuen
>> Scientific Computing
>> Consulting Engineer
>> Apple, Inc.
>> email: wy...@apple.com
>> Tel: 408.718.2859
>>
>>
>>
>>
>> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote:
>>
>>  Send users mailing list submissions to
>>>us...@open-mpi.org
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> or, via email, send a message with subject or body 'help' to
>>>users-requ...@open-mpi.org
>>>
>>> You can reach the person managing the list at
>>>users-ow...@open-mpi.org
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of users digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>  1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig)
>>>
>>>
>>> --
>>>
>>> Message: 1
>>> Date: Mon, 4 May 2009 18:13:45 +0200
>>> From: Vicente Puig 
>>> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>>> To: Open MPI Users 
>>> Message-ID:
>>><3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>> If I can not make it work with Xcode,  which one could I use?, which one
>>> do
>>> you use to compile and debug OpenMPI?.
>>> Thanks
>>>
>>> Vincent
>>>
>>>
>>> 2009/5/4 Jeff Squyres 
>>>
>>>  Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard
>>>> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has
>>>> non-functional mpif77 and mpif90 wrapper compilers.
>>>>
>>>> So the Open MPI that you installed manually will use your Fortran
>>>> compilers, and therefore will have functional mpif77 and mpif90 wrapper
>>>> compilers.  Hence, you probably need to be sure to use the "right"
>>>> wrapper
>>>> compilers.  It looks like you specified the full path specified to
>>>> ExecPath,
>>>&g

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente Puig
But it doesn't work well.
For example, I am trying to debug a program, "floyd" in this case, and when
I make a breakpoint:

No line 26 in file "../../../gcc-4.2-20060805/libgfortran/fmain.c".

I am getting disappointed and frustrated that I can not work well with
openmpi in my Mac. There should be a was to make it run in Xcode, uff...

2009/5/4 Jeff Squyres 

> I get those as well.  I believe that they are (annoying but) harmless -- an
> artifact of how the freeware gcc/gofrtran that I use was built.
>
>
>
> On May 4, 2009, at 1:47 PM, Vicente Puig wrote:
>
>  Maybe I had to open a new thread, but if you have any idea why I receive
>> it when I use gdb for debugging an openmpi program:
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug
>> information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no
>> debug information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no
>> debug information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug
>> information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c".
>>
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o"
>> - no debug information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c".
>>
>>
>> warning: Could not find object file
>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug
>> information available for
>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c".
>> ...
>>
>>
>>
>> There is no 'admin' so I don't know why it happen. It works well with a C
>> program.
>>
>> Any idea??.
>>
>> Thanks.
>>
>>
>> Vincent
>>
>>
>>
>>
>>
>> 2009/5/4 Vicente Puig 
>> I can run openmpi perfectly with command line, but I wanted a graphic
>> interface for debugging because I was having problems.
>>
>> Thanks anyway.
>>
>> Vincent
>>
>> 2009/5/4 Warner Yuen 
>>
>> Admittedly, I don't use Xcode to build Open MPI either.
>>
>> You can just compile Open MPI from the command line and install everything
>> in /usr/local/. Make sure that gfortran is set in your path and you should
>> just be able to do a './configure --prefix=/usr/local'
>>
>> After the installation, just make sure that your path is set correctly
>> when you go to use the newly installed Open MPI. If you don't set your path,
>> it will always default to using the version of OpenMPI that ships with
>> Leopard.
>>
>>
>> Warner Yuen
>> Scientific Computing
>> Consulting Engineer
>> Apple, Inc.
>> email: wy...@apple.com
>> Tel: 408.718.2859
>>
>>
>>
>>
>> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote:
>>
>> Send users mailing list submissions to
>>   us...@open-mpi.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>   http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>>   users-requ...@open-mpi.org
>>
>> You can reach the person managing the list at
>>   users-ow...@open-mpi.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>>  1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig)
>>
>>
>> --
>>
>> Message: 1
>> Date: Mon, 4 May 2009 18:13:45 +0200
>> From: Vicente Puig 
>> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1
>> To: Open MPI Users 
>> Message-ID:
>>   <3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com>
>> Content-Type: text/

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-04 Thread Vicente Puig
I forgot to say that "../../../gcc-4.2-20060805/libgfortran/fmain.c" is
neither the path nor the program that I am trying to debug.

2009/5/5 Vicente Puig 

> But it doesn't work well.
> For example, I am trying to debug a program, "floyd" in this case, and when
> I make a breakpoint:
>
> No line 26 in file "../../../gcc-4.2-20060805/libgfortran/fmain.c".
>
> I am getting disappointed and frustrated that I can not work well with
> openmpi in my Mac. There should be a was to make it run in Xcode, uff...
>
> 2009/5/4 Jeff Squyres 
>
>> I get those as well.  I believe that they are (annoying but) harmless --
>> an artifact of how the freeware gcc/gofrtran that I use was built.
>>
>>
>>
>> On May 4, 2009, at 1:47 PM, Vicente Puig wrote:
>>
>>  Maybe I had to open a new thread, but if you have any idea why I receive
>>> it when I use gdb for debugging an openmpi program:
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug
>>> information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no
>>> debug information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no
>>> debug information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug
>>> information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o"
>>> - no debug information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug
>>> information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c".
>>> ...
>>>
>>>
>>>
>>> There is no 'admin' so I don't know why it happen. It works well with a C
>>> program.
>>>
>>> Any idea??.
>>>
>>> Thanks.
>>>
>>>
>>> Vincent
>>>
>>>
>>>
>>>
>>>
>>> 2009/5/4 Vicente Puig 
>>> I can run openmpi perfectly with command line, but I wanted a graphic
>>> interface for debugging because I was having problems.
>>>
>>> Thanks anyway.
>>>
>>> Vincent
>>>
>>> 2009/5/4 Warner Yuen 
>>>
>>> Admittedly, I don't use Xcode to build Open MPI either.
>>>
>>> You can just compile Open MPI from the command line and install
>>> everything in /usr/local/. Make sure that gfortran is set in your path and
>>> you should just be able to do a './configure --prefix=/usr/local'
>>>
>>> After the installation, just make sure that your path is set correctly
>>> when you go to use the newly installed Open MPI. If you don't set your path,
>>> it will always default to using the version of OpenMPI that ships with
>>> Leopard.
>>>
>>>
>>> Warner Yuen
>>> Scientific Computing
>>> Consulting Engineer
>>> Apple, Inc.
>>> email: wy...@apple.com
>>> Tel: 408.718.2859
>>>
>>>
>>>
>>>
>>> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote:
>>>
>>> Send users mailing list submissions to
>>>   us...@open-mpi.org
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> or, via email, send a message with subject or body 'help' to
>>>   users-requ...@open-mpi.org
>>>
>>> You can reach the person managing the list at
>>>   users-ow...@open-mpi.org
>>>
>>> When replying, please edit your Subject l

Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1

2009-05-06 Thread Vicente Puig
Sorry, I don't understand, how can I try the fortran from macports??.

2009/5/6 Luis Vitorio Cargnini 

> This problem is occuring because the fortran wasn't compiled with the debug
> symbols:
> warning: Could not find object file
> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no
> debug information available for
> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>
> Is the same problem for who is using LLVM in Xcode, there is no debug
> symbols to create a debug release, try create a release and see if it will
> compile at all and try the fortran from macports it will works smoothly.
>
>
> Le 09-05-05 à 17:33, Jeff Squyres a écrit :
>
>
>  I agree; that is a bummer.  :-(
>>
>> Warner -- do you have any advice here, perchance?
>>
>>
>> On May 4, 2009, at 7:26 PM, Vicente Puig wrote:
>>
>>  But it doesn't work well.
>>>
>>> For example, I am trying to debug a program, "floyd" in this case, and
>>> when I make a breakpoint:
>>>
>>> No line 26 in file "../../../gcc-4.2-20060805/libgfortran/fmain.c".
>>>
>>> I am getting disappointed and frustrated that I can not work well with
>>> openmpi in my Mac. There should be a was to make it run in Xcode, uff...
>>>
>>> 2009/5/4 Jeff Squyres 
>>> I get those as well.  I believe that they are (annoying but) harmless --
>>> an artifact of how the freeware gcc/gofrtran that I use was built.
>>>
>>>
>>>
>>> On May 4, 2009, at 1:47 PM, Vicente Puig wrote:
>>>
>>> Maybe I had to open a new thread, but if you have any idea why I receive
>>> it when I use gdb for debugging an openmpi program:
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug
>>> information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no
>>> debug information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no
>>> debug information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug
>>> information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o"
>>> - no debug information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c".
>>>
>>>
>>> warning: Could not find object file
>>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug
>>> information available for
>>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c".
>>> ...
>>>
>>>
>>>
>>> There is no 'admin' so I don't know why it happen. It works well with a C
>>> program.
>>>
>>> Any idea??.
>>>
>>> Thanks.
>>>
>>>
>>> Vincent
>>>
>>>
>>>
>>>
>>>
>>> 2009/5/4 Vicente Puig 
>>> I can run openmpi perfectly with command line, but I wanted a graphic
>>> interface for debugging because I was having problems.
>>>
>>> Thanks anyway.
>>>
>>> Vincent
>>>
>>> 2009/5/4 Warner Yuen 
>>>
>>> Admittedly, I don't use Xcode to build Open MPI either.
>>>
>>> You can just compile Open MPI from the command line and install
>>> everything in /usr/local/. Make sure that gfortran is set in your path and
>>> you should just be able to do a './configure --prefix=/usr/local'
>>>
>>> After the installation, just make sure that your path is set correctly
>>> when you go to use the newly installed Open MPI. If you don't set your path,
>>> it will always default to using the version of OpenMPI that ships with
>>> Leopard.
>>>
>

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Angel de Vicente
Hi,

"r...@open-mpi.org"  writes:
> You might want to try using the DVM (distributed virtual machine)
> mode in ORTE. You can start it on an allocation using the “orte-dvm”
> cmd, and then submit jobs to it with “mpirun --hnp ”, where foo
> is either the contact info printed out by orte-dvm, or the name of
> the file you told orte-dvm to put that info in. You’ll need to take
> it from OMPI master at this point.

this question looked interesting so I gave it a try. In a cluster with
Slurm I had no problem submitting a job which launched an orte-dvm
-report-uri ... and then use that file to launch jobs onto that virtual
machine via orte-submit. 

To be useful to us at this point, I should be able to start executing
jobs if there are cores available and just hold them in a queue if the
cores are already filled. At this point this is not happenning, and if I
try to submit a second job while the previous one has not finished, I
get a message like:

,
| DVM ready
| --
| All nodes which are allocated for this job are already filled.
| --
`

With the DVM, is it possible to keep these jobs in some sort of queue,
so that they will be executed when the cores get free?

Thanks,
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Angel de Vicente
Hi,

"r...@open-mpi.org"  writes:
>> With the DVM, is it possible to keep these jobs in some sort of queue,
>> so that they will be executed when the cores get free?
>
> It wouldn’t be hard to do so - as long as it was just a simple FIFO 
> scheduler. I wouldn’t want it to get too complex.

a simple FIFO should be probably enough. This can be useful as a simple
way to make a multi-core machine accessible to a small group of (friendly)
users, making sure that they don't oversubscribe the machine, but
without going the full route of installing/maintaining a full resource
manager. 

Cheers,
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Angel de Vicente
Hi,

Reuti  writes:
> At first I thought you want to run a queuing system inside a queuing
> system, but this looks like you want to replace the resource manager.

yes, if this could work reasonably well, we could do without the
resource manager.

> Under which user account the DVM daemons will run? Are all users using the 
> same account?

Well, if this could work only for one user, this could still be useful
as I could use it as I do now use GNU Parallel or a private Condor
system, where I can submit hundreds of jobs, but make sure they get
executed without oversubscribing.

For a small group of users if the DVM can run with my user and there is
no restriction on who can use it or if I somehow can authorize others to
use it (via an authority file or similar) that should be enough.

Thanks,
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] "No objects of the specified type were found on at least one node"?

2017-03-09 Thread Angel de Vicente
Hi,

I'm trying to get OpenMPI running in a new machine, and I came accross
an error message that I hadn't seen before. 

,
| can@login1:> mpirun -np 1 ./code config.txt
| --
| No objects of the specified type were found on at least one node:
| 
|   Type: Package
|   Node: login1
| 
| The map cannot be done as specified.
| --
`

Some details: in this machine we have gcc_6.0.3, and with it I installed
OpenMPI (v. 2.0.1). The compilation of OpenMPI went without (obvious)
errors, and I managed to compile my code without problems (if instead
of "mpirun -np 1 ./code" I just run the code directly there are no
issues). 

But if I try to use mpirun in the login node of the cluster I get this
message. If I submit the job to the scheduler (the cluster uses slurm) I
get the same messsage, but the Node information is obviously different,
giving the name of one of the compute nodes.

Any pointers as to what can be going on? Many thanks,
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] "No objects of the specified type were found on at least one node"?

2017-03-09 Thread Angel de Vicente
Hi,

Gilles Gouaillardet  writes:
> which version of ompi are you running ?

2.0.1

> this error can occur on systems with no NUMA object (e.g. single
> socket with hwloc < 2)
> as a workaround, you can
> mpirun --map-by socket ...

with --map-by socket I get exactly the same issue (both in the login and
the compute node)

I will upgrade to 2.0.2 and see if this changes something.

Thanks,
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] "No objects of the specified type were found on at least one node"

2017-03-09 Thread Angel de Vicente
Hi,

Gilles Gouaillardet  writes:
> Can you run
> lstopo
> in your machine, and post the output ?

no lstopo in my machine. This is part of hwloc, right?

> can you also try
> mpirun --map-by socket --bind-to socket ...
> and see if it helps ?

same issue.


Perhaps I need to compile hwloc as well??
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] "No objects of the specified type were found on at least one node"

2017-03-09 Thread Angel de Vicente
Hi again,

thanks for your help. I installed the latest OpenMPI (2.0.2). 

lstopo output:

,
| lstopo --version
| lstopo 1.11.2
| 
| lstopo
| Machine (7861MB)
|   L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
|   (P#0)
|   L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
|   (P#1)
|   L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
|   (P#2)
|   L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
|   (P#3)
|   HostBridge L#0
| PCIBridge
|   PCI 1014:028c
| Block L#0 "sda"
|   PCI 14c1:8043
| Net L#1 "myri0"
| PCIBridge
|   PCI 14e4:166b
| Net L#2 "eth0"
|   PCI 14e4:166b
| Net L#3 "eth1"
| PCIBridge
|   PCI 1002:515e
`

I started with GCC 6.3.0, compiled OpenMPI 2.0.2 with it, and then HDF5
1.10.0-patch1 with it. Our code then compiles OK with it, and it runs OK
without "mpirun":

,
| ./mancha3D
| __  ___
|/'\_/`\ /\ \   /'__`\ /\  _ `\
|   /\  \ __  ___ ___\ \ \___  __  /\_\L\ \\ \ \/\ \
|   \ \ \__\ \  /'__`\  /' _ `\  /'___\ \  _ `\  /'__`\\/_/_\_<_\ \ \ \ \
|\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \ \/
|  \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/   \/___/
| 
|  ./mancha3D should be given the name of a control file as argument.
`




But it complains as before when run with mpirun

,
| mpirun --map-by socket --bind-to socket -np 1 ./mancha3D
| --
| No objects of the specified type were found on at least one node:
| 
|   Type: Package
|   Node: login1
| 
| The map cannot be done as specified.
| --
`


If I submit it directly with srun, then the code runs, but not in
parallel, and two individual copies of the code are started:

,
| srun -n 2 ./mancha3D
| __  ___
|/'\_/`\ /\ \   /'__`\ /\  _ `\
|   /\  \ __  ___ ___\ \ \___  __  /\_\L\ \\ \ \/\ \
|   \ \ \__\ \  /'__`\  /' _ `\  /'___\ \  _ `\  /'__`\\/_/_\_<_\ \ \ \ \
|\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \ \/
|  \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/   \/___/
| 
|  should be given the name of a control file as argument.
| __  ___
|/'\_/`\ /\ \   /'__`\ /\  _ `\
|   /\  \ __  ___ ___\ \ \___  __  /\_\L\ \\ \ \/\ \
|   \ \ \__\ \  /'__`\  /' _ `\  /'___\ \  _ `\  /'__`\\/_/_\_<_\ \ \ \ \
|\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \ \/
|  \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/   \/___/
| 
|  should be  given the name of a control file as argument.
`



Any ideas are welcome. Many thanks,
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] "No objects of the specified type were found on at least one node"

2017-03-09 Thread Angel de Vicente
Can this help? If you think any other information could be relevant, let me
know.

Cheers,
Ángel

cat /proc/cpuinfo
processor   : 0
cpu : PPC970MP, altivec supported
clock   : 2297.70MHz
revision: 1.1 (pvr 0044 0101)

[4 processors]

timebase: 14318000
machine : CHRP IBM,8844-Z0C

uname -a
Linux login1 2.6.16.60-perfctr-0.42.4-ppc64 #1 SMP Fri Aug 21 15:25:15 CEST
2009 ppc64 ppc64 ppc64 GNU/Linux

lsb_release -a
Distributor ID: SUSE LINUX
Description:SUSE Linux Enterprise Server 10 (ppc)
Release:10


On 9 March 2017 at 15:04, Brice Goglin  wrote:

> What's this machine made of? (processor, etc)
> What kernel are you running ?
>
> Getting no "socket" or "package" at all is quite rare these days.
>
> Brice
>
>
>
>
> Le 09/03/2017 15:28, Angel de Vicente a écrit :
> > Hi again,
> >
> > thanks for your help. I installed the latest OpenMPI (2.0.2).
> >
> > lstopo output:
> >
> > ,
> > | lstopo --version
> > | lstopo 1.11.2
> > |
> > | lstopo
> > | Machine (7861MB)
> > |   L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
> > |   (P#0)
> > |   L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
> > |   (P#1)
> > |   L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
> > |   (P#2)
> > |   L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
> > |   (P#3)
> > |   HostBridge L#0
> > | PCIBridge
> > |   PCI 1014:028c
> > | Block L#0 "sda"
> > |   PCI 14c1:8043
> > | Net L#1 "myri0"
> > | PCIBridge
> > |   PCI 14e4:166b
> > | Net L#2 "eth0"
> > |   PCI 14e4:166b
> > | Net L#3 "eth1"
> > | PCIBridge
> > |   PCI 1002:515e
> > `
> >
> > I started with GCC 6.3.0, compiled OpenMPI 2.0.2 with it, and then HDF5
> > 1.10.0-patch1 with it. Our code then compiles OK with it, and it runs OK
> > without "mpirun":
> >
> > ,
> > | ./mancha3D
> > | __  ___
> > |/'\_/`\ /\ \   /'__`\ /\  _ `\
> > |   /\  \ __  ___ ___\ \ \___  __  /\_\L\ \\ \ \/\ \
> > |   \ \ \__\ \  /'__`\  /' _ `\  /'___\ \  _ `\  /'__`\\/_/_\_<_\ \ \ \ \
> > |\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\
> \
> > | \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \
> \/
> > |  \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/   \/___/
> > |
> > |  ./mancha3D should be given the name of a control file as argument.
> > `
> >
> >
> >
> >
> > But it complains as before when run with mpirun
> >
> > ,
> > | mpirun --map-by socket --bind-to socket -np 1 ./mancha3D
> > | 
> --
> > | No objects of the specified type were found on at least one node:
> > |
> > |   Type: Package
> > |   Node: login1
> > |
> > | The map cannot be done as specified.
> > | 
> --
> > `
> >
> >
> > If I submit it directly with srun, then the code runs, but not in
> > parallel, and two individual copies of the code are started:
> >
> > ,
> > | srun -n 2 ./mancha3D
> > | __  ___
> > |/'\_/`\ /\ \   /'__`\ /\  _ `\
> > |   /\  \ __  ___ ___\ \ \___  __  /\_\L\ \\ \ \/\ \
> > |   \ \ \__\ \  /'__`\  /' _ `\  /'___\ \  _ `\  /'__`\\/_/_\_<_\ \ \ \ \
> > |\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\
> \
> > | \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \
> \/
> > |  \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/   \/___/
> > |
> > |  should be given the name of a control file as argument.
> > | __  ___
> > |/'\_/`\ /\ \   /'__`\ /\  _ `\
> > |   /\  \ __  ___ ___\ \ \___  __  /\_\L\ \\ \ \/\ \
> > |   \ \ \__\ \  /'__`\  /' _ `\  /'___\ \  _ `\  /'__`\\/_/_\_<_\ \ \ \ \
> > |\ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\
> \
> > | \ \_\\ \_\ \__/.\_\ \_\ \_\ \\\ \_\ \_\ \__/.\_\ \/ \
> \/
> > |  \/_/ \/_/\/__/\/_/\/_/\/_/\// \/_/\/_/\/__/\/_/\/___/   \/___/
> > |
> > |  should be  given the name of a control file as argument.
> > `
> >
> >
> >
> > Any ideas are welcome. Many thanks,
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] "No objects of the specified type were found on at least one node"

2017-03-13 Thread Angel de Vicente
Brice Goglin  writes:

> Ok, that's a very old kernel on a very old POWER processor, it's
> expected that hwloc doesn't get much topology information, and it's
> then expected that OpenMPI cannot apply most binding policies.

Just in case it can add anything, I tried with an older OpenMPI version
(1.10.6), and I cannot get it to work either, but the message is
different:

,
| --
| No objects of the specified type were found on at least one node:
|   
|   Type: Socket
|   Node: s01c1b08
| 
| The map cannot be done as specified.
| ------
`


-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/  
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Help diagnosing problem: not being able to run MPI code across computers

2013-05-04 Thread Angel de Vicente
Hi,

I have used OpenMPI before without any troubles, and configured MPICH,
MPICH2 and OpenMPI in many different machines before, but recently we
upgraded the OS to Fedora 17, and now I'm having trouble running an MPI
code in two of our machines connected via a switch.

I thought perhaps the old installation was giving problems, so I
reinstalled OpenMPI (1.6.4) and I have no trouble when running a
parallel code in just one node. I also don't have any trouble ssh'ing
(without need for password) between these machines, but when I try to
run a parallel job spanning both machines, I get a hanged mpiexec
process in the submitting machine, and an "orted" process in the other
machine, but nothing moves. 

I guess it is an issue with libraries and/or different MPI versions (the
machines have other site-wide MPI libraries installed), but I'm not sure
how to debug the issue. I looked in the FAQ, but I didn't find anything
relevant. Issue
http://www.open-mpi.org/faq/?category=running#intel-compilers-static is
different, since I don't get any warning or errors when running, just
all processes stuck. 

Is there any way to dump details of what OpenMPI is trying to do in each
node, so I can see if it is looking for different libraries in each
node, or something similar?

Thanks,
-- 
Ángel de Vicente
http://angel-de-vicente.blogspot.com/



Re: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers

2013-05-06 Thread Angel de Vicente
Hi,

Ralph Castain  writes:

> On May 4, 2013, at 4:54 PM, Angel de Vicente  wrote:
>> 
>> Is there any way to dump details of what OpenMPI is trying to do in each
>> node, so I can see if it is looking for different libraries in each
>> node, or something similar?


thanks for the suggestions, but I'm still stuck:

> What I do is simply "ssh ompi_info -V" to each remote node and compare
> results - you should get the same answer everywhere.

exactly the same information in the three connected machines

> Another option in these situations is to configure
> --enable-orterun-prefix-by-default. If you install in the same
> location on each node (e.g., on an NSF mount), then this will ensure
> you get that same library.

Re-configured and re-compiled OpenMPI, but I get the same behaviour. 

I'm starting to think that perhaps is a firewall issue? I don't have
root access in these machines but I'll try to investigate.

Cheers,
-- 
Ángel de Vicente
http://angel-de-vicente.blogspot.com/



Re: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers

2013-05-07 Thread Angel de Vicente
Hi,

"Jeff Squyres (jsquyres)"  writes:
>>> I'm starting to think that perhaps is a firewall issue? I don't have
>>> root access in these machines but I'll try to investigate.

> A simple test is to try any socket-based server app between the two
> machines that opens a random listening socket.  Try to telnet to it
> from the other machine.  If it fails to connect, then you likely have
> a firewalling issue.

yes, that's just what I did with orted. I saw the port that it was
trying to connect and telnet to it, and I got "No route to host", so
that's why I was going the firewall path. Hopefully the sysadmins can
disable the firewall for the internal network today, and I can see if
that solves the issue.

Thanks,
-- 
Ángel de Vicente
http://angel-de-vicente.blogspot.com/



Re: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers

2013-05-07 Thread Angel de Vicente
Hi again,

Angel de Vicente  writes:
> yes, that's just what I did with orted. I saw the port that it was
> trying to connect and telnet to it, and I got "No route to host", so
> that's why I was going the firewall path. Hopefully the sysadmins can
> disable the firewall for the internal network today, and I can see if
> that solves the issue.

OK, removing the firewall for the private network improved things a
lot. 

A simple "Hello World" seems to work without issues, but if I run my
code, I have a problem like this:

[angelv@comer RTI2D.Parallel]$ mpiexec -prefix $OMPI_PREFIX -hostfile
$MPI_HOSTS -n 10 ../../../mancha2D_mpi_h5fc.x mancha.trol

[...]

[comer][[58110,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
connect() to 161.72.206.3 failed: No route to host (113)
[comer][[58110,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
[comer][[58110,1],3][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
connect() to 161.72.206.3 failed: No route to host (113)
connect() to 161.72.206.3 failed: No route to host (113)
[comer][[58110,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
connect() to 161.72.206.3 failed: No route to host (113)
[comer][[58110,1],2][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect]
connect() to 161.72.206.3 failed: No route to host (113)

But MPI_HOSTS points to a file with 
$ cat /net/nas7/polar/minicluster/machinefile-openmpi
c0 slots=5
c1 slots=5
c2 slots=5

c0, c1, and c2 are the names of the machines in the internal network,
but for some reason it is using the public interfaces and complaining
(the firewall in those is still active). I thought just specifying the
names of the machines in the machinefile would make sure that we were
using the right interface... 

Any help? Thanks,
-- 
Ángel de Vicente
http://angel-de-vicente.blogspot.com/



Re: [OMPI users] Help diagnosing problem: not being able to run MPI code across computers

2013-05-07 Thread Angel de Vicente
Hi,

"Jeff Squyres (jsquyres)"  writes:
> The list of names in the hostfile specifies the servers that will be used, 
> not the network interfaces.  Have a look at the TCP portion of the FAQ:
>
> http://www.open-mpi.org/faq/?category=tcp

Thanks a lot for this. 

Now it works OK if I run it like 

[angelv@comer RTI2D.Parallel]$ mpiexec -loadbalance --mca
btl_tcp_if_include p1p1  -prefix $OMPI_PREFIX -hostfile $MPI_HOSTS -n 4
../../../mancha2D_mpi_h5fc.x\
 mancha.trol

But, the FAQ seems to be wrong, since it also says that I should be able
to run like:

[angelv@comer RTI2D.Parallel]$ mpiexec -loadbalance --mca
btl_tcp_if_include 192.168.1.x/24  -prefix $OMPI_PREFIX -hostfile
$MPI_HOSTS -n 4 ../../../mancha2D_\
mpi_h5fc.x mancha.trol

but then I get the following error:

--
WARNING: An invalid value was given for btl_tcp_if_include.  This
value will be ignored.

  Local host: catar
  Value:  192.168.1.x/24
  Message:Invalid specification (inet_pton() failed)
--

If I specify the subnet as 192.168.1.0/24 all is in order.

I'm running 1.6.4:

[angelv@comer RTI2D.Parallel]$ ompi_info
 Package: Open MPI angelv@comer Distribution
Open MPI: 1.6.4


Thanks,
-- 
Ángel de Vicente
http://angel-de-vicente.blogspot.com/



[OMPI users] init_thread + spawn error

2007-10-01 Thread Joao Vicente Lima
Hi all!
I'm getting a error on call MPI_Init_thread and MPI_Comm_spawn.
am I mistaking something?
the attachments contains my ompi_info and source ...

thank!
Joao


  char *arg[]= {"spawn1", (char *)0};

  MPI_Init_thread (&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
  MPI_Comm_spawn ("./spawn_slave", arg, 1,
  MPI_INFO_NULL, 0, MPI_COMM_SELF, &slave,
  MPI_ERRCODES_IGNORE);
.

and the error:

opal_mutex_lock(): Resource deadlock avoided
[c8:13335] *** Process received signal ***
[c8:13335] Signal: Aborted (6)
[c8:13335] Signal code:  (-6)
[c8:13335] [ 0] [0xb7fbf440]
[c8:13335] [ 1] /lib/libc.so.6(abort+0x101) [0xb7abd5b1]
[c8:13335] [ 2] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e2933c]
[c8:13335] [ 3] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e2923a]
[c8:13335] [ 4] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e292e3]
[c8:13335] [ 5] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e29fa7]
[c8:13335] [ 6] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e29eda]
[c8:13335] [ 7] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0 [0xb7e2adec]
[c8:13335] [ 8] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0(ompi_proc_unpack+
0x181) [0xb7e2b142]
[c8:13335] [ 9] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0(ompi_comm_connect
_accept+0x57c) [0xb7e0fb70]
[c8:13335] [10] /usr/local/openmpi/openmpi-svn/lib/libmpi.so.0(PMPI_Comm_spawn+0
x395) [0xb7e5e285]
[c8:13335] [11] ./spawn(main+0x7f) [0x80486ef]
[c8:13335] [12] /lib/libc.so.6(__libc_start_main+0xdc) [0xb7aa7ebc]
[c8:13335] [13] ./spawn [0x80485e1]
[c8:13335] *** End of error message ***
--
mpirun has exited due to process rank 0 with PID 13335 on
node c8 calling "abort". This will have caused other processes
in the application to be terminated by signals sent by mpirun
(as reported here).
--

#include "mpi.h"
#include 

int main (int argc, char **argv)
{
  int provided;
  MPI_Comm slave;
  char *arg[]= {"spawn1", (char *)0};

  MPI_Init_thread (&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
  MPI_Comm_spawn ("./spawn_slave", arg, 1, 
  MPI_INFO_NULL, 0, MPI_COMM_SELF, &slave,
  MPI_ERRCODES_IGNORE);

  MPI_Finalize ();
  return 0;
}
Open MPI: 1.3a1r16236
   Open MPI SVN revision: r16236
Open RTE: 1.3a1r16236
   Open RTE SVN revision: r16236
OPAL: 1.3a1r16236
   OPAL SVN revision: r16236
  Prefix: /usr/local/openmpi/openmpi-svn
 Configured architecture: i686-pc-linux-gnu
  Configure host: corisco
   Configured by: lima
   Configured on: Wed Sep 26 11:37:04 BRT 2007
  Configure host: corisco
Built by: lima
Built on: Wed Sep 26 12:07:13 BRT 2007
  Built host: corisco
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: no
 Fortran90 bindings size: na
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: g77
  Fortran77 compiler abs: /usr/bin/g77
  Fortran90 compiler: none
  Fortran90 compiler abs: none
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: no
  C++ exceptions: no
  Thread support: posix (mpi: yes, progress: no)
   Sparse Groups: no
  Internal debug support: yes
 MPI parameter check: runtime
Memory profiling support: yes
Memory debugging support: yes
 libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: no
 MPI I/O support: yes
   MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.3)
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.3)
   MCA paffinity: linux (MCA v1.0, API v1.1, Component v1.3)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.3)
   MCA timer: linux (MCA v1.0, API v1.0, Component v1.3)
 MCA installdirs: env (MCA v1.0, API v1.0, Component v1.3)
 MCA installdirs: config (MCA v1.0, API v1.0, Component v1.3)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.3)
MCA coll: inter (MCA v1.0, API v1.0, Component v1.3)
MCA coll: self (MCA v1.0, API v1.0, Component v1.3)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.3)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.3)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.3)
   MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.3)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.3)
  

[OMPI users] MPI_Comm_spawn errors

2008-02-18 Thread Joao Vicente Lima
Hi all,
I'm getting errors with spawn in the situations:

1) spawn1.c - spawning 2 process on localhost, one by one,  the error is:

spawning ...
[localhost:31390] *** Process received signal ***
[localhost:31390] Signal: Segmentation fault (11)
[localhost:31390] Signal code: Address not mapped (1)
[localhost:31390] Failing at address: 0x98
[localhost:31390] [ 0] /lib/libpthread.so.0 [0x2b1d38a17ed0]
[localhost:31390] [ 1]
/usr/local/mpi/openmpi-svn/lib/libmpi.so.0(ompi_comm_dyn_finalize+0xd2)
[0x2b1d37667cb2]
[localhost:31390] [ 2]
/usr/local/mpi/openmpi-svn/lib/libmpi.so.0(ompi_comm_finalize+0x3b)
[0x2b1d3766358b]
[localhost:31390] [ 3]
/usr/local/mpi/openmpi-svn/lib/libmpi.so.0(ompi_mpi_finalize+0x248)
[0x2b1d37679598]
[localhost:31390] [ 4] ./spawn1(main+0xac) [0x400ac4]
[localhost:31390] [ 5] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b1d38c43b74]
[localhost:31390] [ 6] ./spawn1 [0x400989]
[localhost:31390] *** End of error message ***
--
mpirun has exited due to process rank 0 with PID 31390 on
node localhost calling "abort". This will have caused other processes
in the application to be terminated by signals sent by mpirun
(as reported here).
--

With 1 process spawned or with 2 process spawned in one call there is
no output from child.

2) spawn2.c - no response, this init is
 MPI_Init_thread (&argc, &argv, MPI_THREAD_MULTIPLE, &required)

the attachments contains the programs, ompi_info and config.log.

Some suggest ?

thanks a lot.
Joao.


spawn1.c.gz
Description: GNU Zip compressed data


spawn2.c.gz
Description: GNU Zip compressed data


ompi_info.txt.gz
Description: GNU Zip compressed data


config.log.gz
Description: GNU Zip compressed data


[OMPI users] Spawn problem

2008-03-31 Thread Joao Vicente Lima
Hi,
sorry bring this again ... but i hope use spawn in ompi someday :-D

The execution of spawn in this way works fine:
MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);

but if this code go to a for I get a problem :
for (i= 0; i < 2; i++)
{
  MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1,
  MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm[i], MPI_ERRCODES_IGNORE);
}

and the error is:
spawning ...
child!
child!
[localhost:03892] *** Process received signal ***
[localhost:03892] Signal: Segmentation fault (11)
[localhost:03892] Signal code: Address not mapped (1)
[localhost:03892] Failing at address: 0xc8
[localhost:03892] [ 0] /lib/libpthread.so.0 [0x2ac71ca8bed0]
[localhost:03892] [ 1]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_dpm_base_dyn_finalize+0xa3)
[0x2ac71ba7448c]
[localhost:03892] [ 2] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2ac71b9decdf]
[localhost:03892] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2ac71ba04765]
[localhost:03892] [ 4]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Finalize+0x71)
[0x2ac71ba365c9]
[localhost:03892] [ 5] ./spawn1(main+0xaa) [0x400ac2]
[localhost:03892] [ 6] /lib/libc.so.6(__libc_start_main+0xf4) [0x2ac71ccb7b74]
[localhost:03892] [ 7] ./spawn1 [0x400989]
[localhost:03892] *** End of error message ***
--
mpirun noticed that process rank 0 with PID 3892 on node localhost
exited on signal 11 (Segmentation fault).
--

the attachments contain the ompi_info, config.log and program.

thanks for some check,
Joao.


config.log.gz
Description: GNU Zip compressed data


ompi_info.txt.gz
Description: GNU Zip compressed data


spawn1.c.gz
Description: GNU Zip compressed data


Re: [OMPI users] Spawn problem

2008-03-31 Thread Joao Vicente Lima
Really MPI_Finalize is crashing and calling MPI_Comm_{free,disconnect} works!
I don't know if the free/disconnect must appear before a MPI_Finalize
for this case (spawn processes)   some suggest ?

I use loops in spawn:
-  first for testing :)
- and second because certain MPI applications don't know in advance
the number of childrens needed to complete his work.

The spawn works is creat ... I will made other tests.

thanks,
Joao

On Mon, Mar 31, 2008 at 3:03 AM, Matt Hughes
 wrote:
> On 30/03/2008, Joao Vicente Lima  wrote:
>  > Hi,
>  >  sorry bring this again ... but i hope use spawn in ompi someday :-D
>
>  I believe it's crashing in MPI_Finalize because you have not closed
>  all communication paths between the parent and the child processes.
>  For the parent process, try calling MPI_Comm_free or
>  MPI_Comm_disconnect on each intercomm in your intercomm array before
>  calling finalize.  On the child, call free or disconnect on the parent
>  intercomm before calling finalize.
>
>  Out of curiosity, why a loop of spawns?  Why not increase the value of
>  the maxprocs argument, or if you need to spawn different executables,
>  or use different arguments for each instance, why not
>  MPI_Comm_spawn_multiple?
>
>  mch
>
>
>
>
>
>  >
>  >  The execution of spawn in this way works fine:
>  >  MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
>  >  MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);
>  >
>  >  but if this code go to a for I get a problem :
>  >  for (i= 0; i < 2; i++)
>  >  {
>  >   MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1,
>  >   MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm[i], MPI_ERRCODES_IGNORE);
>  >  }
>  >
>  >  and the error is:
>  >  spawning ...
>  >  child!
>  >  child!
>  >  [localhost:03892] *** Process received signal ***
>  >  [localhost:03892] Signal: Segmentation fault (11)
>  >  [localhost:03892] Signal code: Address not mapped (1)
>  >  [localhost:03892] Failing at address: 0xc8
>  >  [localhost:03892] [ 0] /lib/libpthread.so.0 [0x2ac71ca8bed0]
>  >  [localhost:03892] [ 1]
>  >  /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_dpm_base_dyn_finalize+0xa3)
>  >  [0x2ac71ba7448c]
>  >  [localhost:03892] [ 2] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 
> [0x2ac71b9decdf]
>  >  [localhost:03892] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 
> [0x2ac71ba04765]
>  >  [localhost:03892] [ 4]
>  >  /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Finalize+0x71)
>  >  [0x2ac71ba365c9]
>  >  [localhost:03892] [ 5] ./spawn1(main+0xaa) [0x400ac2]
>  >  [localhost:03892] [ 6] /lib/libc.so.6(__libc_start_main+0xf4) 
> [0x2ac71ccb7b74]
>  >  [localhost:03892] [ 7] ./spawn1 [0x400989]
>  >  [localhost:03892] *** End of error message ***
>  >  --
>  >  mpirun noticed that process rank 0 with PID 3892 on node localhost
>  >  exited on signal 11 (Segmentation fault).
>  >  --
>  >
>  >  the attachments contain the ompi_info, config.log and program.
>  >
>  >  thanks for some check,
>  >
>  > Joao.
>  >
>
>
> > ___
>  >  users mailing list
>  >  us...@open-mpi.org
>  >  http://www.open-mpi.org/mailman/listinfo.cgi/users
>  >
>  >
>  ___
>  users mailing list
>  us...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Spawn problem

2008-03-31 Thread Joao Vicente Lima
Hi again,
when I call MPI_Init_thread in the same program the error is:

spawning ...
opal_mutex_lock(): Resource deadlock avoided
[localhost:07566] *** Process received signal ***
[localhost:07566] Signal: Aborted (6)
[localhost:07566] Signal code:  (-6)
[localhost:07566] [ 0] /lib/libpthread.so.0 [0x2abe5630ded0]
[localhost:07566] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2abe5654c3c5]
[localhost:07566] [ 2] /lib/libc.so.6(abort+0x10e) [0x2abe5654d73e]
[localhost:07566] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe5528063b]
[localhost:07566] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280559]
[localhost:07566] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe552805e8]
[localhost:07566] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280fff]
[localhost:07566] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280f3d]
[localhost:07566] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55281f59]
[localhost:07566] [ 9] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+
0x204) [0x2abe552823cd]
[localhost:07566] [10] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2a
be58efb5f7]
[localhost:07566] [11] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(MPI_Comm_spawn+0x
465) [0x2abe552b55cd]
[localhost:07566] [12] ./spawn1(main+0x9d) [0x400b05]
[localhost:07566] [13] /lib/libc.so.6(__libc_start_main+0xf4) [0x2abe56539b74]
[localhost:07566] [14] ./spawn1 [0x4009d9]
[localhost:07566] *** End of error message ***
opal_mutex_lock(): Resource deadlock avoided
[localhost:07567] *** Process received signal ***
[localhost:07567] Signal: Aborted (6)
[localhost:07567] Signal code:  (-6)
[localhost:07567] [ 0] /lib/libpthread.so.0 [0x2b48610f9ed0]
[localhost:07567] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2b48613383c5]
[localhost:07567] [ 2] /lib/libc.so.6(abort+0x10e) [0x2b486133973e]
[localhost:07567] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c63b]
[localhost:07567] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c559]
[localhost:07567] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c5e8]
[localhost:07567] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006cfff]
[localhost:07567] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006cf3d]
[localhost:07567] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006df59]
[localhost:07567] [ 9] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+
0x204) [0x2b486006e3cd]
[localhost:07567] [10] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b
4863ce75f7]
[localhost:07567] [11] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b
4863ce9c2b]
[localhost:07567] [12] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b48600720d7]
[localhost:07567] [13] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Init_thread+
0x166) [0x2b48600ae4f2]
[localhost:07567] [14] ./spawn1(main+0x2c) [0x400a94]
[localhost:07567] [15] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b4861325b74]
[localhost:07567] [16] ./spawn1 [0x4009d9]
[localhost:07567] *** End of error message ***
--
mpirun noticed that process rank 0 with PID 7566 on node localhost exited on sig
nal 6 (Aborted).
--

thank for some check,
Joao.

On Mon, Mar 31, 2008 at 11:49 AM, Joao Vicente Lima
 wrote:
> Really MPI_Finalize is crashing and calling MPI_Comm_{free,disconnect} works!
>  I don't know if the free/disconnect must appear before a MPI_Finalize
>  for this case (spawn processes)   some suggest ?
>
>  I use loops in spawn:
>  -  first for testing :)
>  - and second because certain MPI applications don't know in advance
>  the number of childrens needed to complete his work.
>
>  The spawn works is creat ... I will made other tests.
>
>  thanks,
>  Joao
>
>
>
>  On Mon, Mar 31, 2008 at 3:03 AM, Matt Hughes
>   wrote:
>  > On 30/03/2008, Joao Vicente Lima  wrote:
>  >  > Hi,
>  >  >  sorry bring this again ... but i hope use spawn in ompi someday :-D
>  >
>  >  I believe it's crashing in MPI_Finalize because you have not closed
>  >  all communication paths between the parent and the child processes.
>  >  For the parent process, try calling MPI_Comm_free or
>  >  MPI_Comm_disconnect on each intercomm in your intercomm array before
>  >  calling finalize.  On the child, call free or disconnect on the parent
>  >  intercomm before calling finalize.
>  >
>  >  Out of curiosity, why a loop of spawns?  Why not increase the value of
>  >  the maxprocs argument, or if you need to spawn different executables,
>  >  or use different arguments for each instance, why not
>  >  MPI_Comm_spawn_multiple?
>  >
>  >  mch
>  >
>  >
>  >
>  >
>  >
>  >  >
>  >  >  The execution of spawn in this way works fine

[OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-03 Thread Angel de Vicente via users
Hi,

in one of our codes, we want to create a log of events that happen in
the MPI processes, where the number of these events and their timing is
unpredictable.

So I implemented a simple test code, where process 0
creates a thread that is just busy-waiting for messages from any
process, and which is sent to stdout/stderr/log file upon receiving
them. The test code is at https://github.com/angel-devicente/thread_io
and the same idea went into our "real" code.

As far as I could see, this behaves very nicely, there are no deadlocks,
no lost messages and the performance penalty is minimal when considering
the real application this is intended for.

But then I found that in a local cluster the performance was very bad
(from ~5min 50s to ~5s for some test) when run with the locally
installed OpenMPI and my own OpenMPI installation (same gcc and OpenMPI
versions). Checking the OpenMPI configuration details, I found that the
locally installed OpenMPI was configured to use the Mellanox IB driver,
and in particular the hcoll component was somehow killing performance:

running with

mpirun  --mca coll_hcoll_enable 0 -np 51 ./test_t

was taking ~5s, while enabling coll_hcoll was killing performance, as
stated above (when run in a single node the performance also goes down,
but only about a factor 2X).

Has anyone seen anything like this? Perhaps a newer Mellanox driver
would solve the problem?

We were planning on making our code public, but before we do so, I want
to understand under which conditions we could have this problem with the
"Threaded I/O" approach and if possible how to get rid of it completely.

Any help/pointers appreciated.
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en



Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-04 Thread Angel de Vicente via users
Hi,

George Bosilca  writes:

> If I'm not mistaken, hcoll is playing with the opal_progress in a way
> that conflicts with the blessed usage of progress in OMPI and prevents
> other components from advancing and timely completing requests. The
> impact is minimal for sequential applications using only blocking
> calls, but is jeopardizing performance when multiple types of
> communications are simultaneously executing or when multiple threads
> are active.
>
> The solution might be very simple: hcoll is a module providing support
> for collective communications so as long as you don't use collectives,
> or the tuned module provides collective performance similar to hcoll
> on your cluster, just go ahead and disable hcoll. You can also reach
> out to Mellanox folks asking them to fix the hcoll usage of
> opal_progress.

until we find a more robust solution I was thinking on trying to just
enquiry the MPI implementation at running time and use the threaded
version if hcoll is not present and go for the unthreaded version if it
is. Looking at the coll.h file I see that some functions there might be
useful, for example: mca_coll_base_component_comm_query_2_0_0_fn_t, but
I have never delved here. Would this be an appropriate approach? Any
examples on how to enquiry in code for a particular component?

Thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en



Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-05 Thread Angel de Vicente via users
Hi,

Joshua Ladd  writes:

> We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it
> takes exactly the same 19 secs (80 ranks).  
>
> What version of HCOLL are you using? Command line? 

Thanks for having a look at this.

According to ompi_info, our OpenMPI (version 3.0.1) was configured with
(and gcc version 7.2.0):

,
|   Configure command line: 'CFLAGS=-I/apps/OPENMPI/SRC/PMI/include'
|   '--prefix=/storage/apps/OPENMPI/3.0.1/gnu'
|   '--with-mxm=/opt/mellanox/mxm'
|   '--with-hcoll=/opt/mellanox/hcoll'
|   '--with-knem=/opt/knem-1.1.2.90mlnx2'
|   '--with-slurm' '--with-pmi=/usr'
|   '--with-pmi-libdir=/usr/lib64'
|   
'--with-platform=../contrib/platform/mellanox/optimized'
`

Not sure if there is a better way to find out the HCOLL version, but the
file hcoll_version.h in /opt/mellanox/hcoll/include/hcoll/api/ says we
have version 3.8.1649

Code compiled as:

,
| $ mpicc -o test_t thread_io.c test.c
`

To run the tests, I just submit the job to Slurm with the following
script (changing the coll_hcoll_enable param accordingly):

,
| #!/bin/bash
| #
| #SBATCH -J test
| #SBATCH -N 5
| #SBATCH -n 51
| #SBATCH -t 00:07:00
| #SBATCH -o test-%j.out
| #SBATCH -e test-%j.err
| #SBATCH -D .
| 
| module purge
| module load openmpi/gnu/3.0.1
| 
| time mpirun --mca coll_hcoll_enable 1 -np 51 ./test_t
`

In the latest test I managed to squeeze in our queuing system, the
hcoll-disabled run took ~3.5s, and the hcoll-enabled one ~43.5s (in this
one I actually commented out all the fprintf statements just in case, so
the code was pure communication).

Thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en



Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-06 Thread Angel de Vicente via users
Hi,

Joshua Ladd  writes:

> This is an ancient version of HCOLL. Please upgrade to the latest
> version (you can do this by installing HPC-X
> https://www.mellanox.com/products/hpc-x-toolkit) 

Just to close the circle and inform that all seems OK now.

I don't have root permission in this machine, so I could not change the
Mellanox drivers (4.1.1.0.2), but I downloaded the latest (as far as I
can tell) HPC-X version compatible with that driver (v.2.1.0), which
comes with hcoll version (4.0.2127). I compiled the OpenMPI version that
comes with HPC-X toolkit, so that I was using the same compiler
(gcc-7.2.0) used in the cluster version of OpenMPI, then HDF5 as well. 

And with that, both the thread_io test code and our real app seem to
behave very nicely and I get basically same timings when using the
MPI_THREAD_MULTIPLE or the single threaded MPI version.

All sorted. Many thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en



[OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-17 Thread Angel de Vicente via users
Hi,

I'm trying to compile the latest OpenMPI version with Infiniband support
in our local cluster, but didn't get very far (since I'm installing this
via Spack, I also asked in their support group).

I'm doing the installation via Spack, which is issuing the following
.configure step (see the options given for --with-knem, --with-hcoll and
--with-mxm):

,
| configure'
| 
'--prefix=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/openmpi-4.1.1-jsvbusyjgthr2d6oyny5klt62gm6ma2u'
| '--enable-shared' '--disable-silent-rules' '--disable-builtin-atomics'
| '--enable-static' '--without-pmi'
| 
'--with-zlib=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/zlib-1.2.11-hrstx5ffrg4f4k3xc2anyxed3mmgdcoz'
| '--enable-mpi1-compatibility' '--with-knem=/opt/knem-1.1.2.90mlnx2'
| '--with-hcoll=/opt/mellanox/hcoll' '--without-psm' '--without-ofi'
| '--without-cma' '--without-ucx' '--without-fca'
| '--with-mxm=/opt/mellanox/mxm' '--without-verbs' '--without-xpmem'
| '--without-psm2' '--without-alps' '--without-lsf' '--without-sge'
| '--without-slurm' '--without-tm' '--without-loadleveler'
| '--disable-memchecker'
| 
'--with-libevent=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/libevent-2.1.12-yd5l4tjmnigv6dqlv5afpn4zc6ekdchc'
| 
'--with-hwloc=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/hwloc-2.6.0-bfnt4g3givflydpe5d2iglyupgbzxbfn'
| '--disable-java' '--disable-mpi-java' '--without-cuda'
| '--enable-wrapper-rpath' '--disable-wrapper-runpath' '--disable-mpi-cxx'
| '--disable-cxx-exceptions'
| 
'--with-wrapper-ldflags=-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib/gcc/x86_64-pc-linux-gnu/9.3.0
| 
-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib64'
`

Later on in the configuration phase I see:

,
| --- MCA component btl:openib (m4 configuration macro)
| checking for MCA component btl:openib compile mode... static
| checking whether expanded verbs are available... yes
| checking whether IBV_EXP_ATOMIC_HCA_REPLY_BE is declared... yes
| checking whether IBV_EXP_QP_CREATE_ATOMIC_BE_REPLY is declared... yes
| checking whether ibv_exp_create_qp is declared... yes
| checking whether ibv_exp_query_device is declared... yes
| checking whether IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG is declared... yes
| checking for struct ibv_exp_device_attr.ext_atom... yes
| checking for struct ibv_exp_device_attr.exp_atomic_cap... yes
| checking if MCA component btl:openib can compile... no
`

This is the first time I try to compile OpenMPI this way, and I get a
bit confused with what each bit is doing, but it looks like it goes
through the moves to get the btl:openib built, but then for some reason
it cannot compile it.

Any suggestions/pointers?

Many thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-18 Thread Angel de Vicente via users
Hello,

Gilles Gouaillardet via users  writes:

> Infiniband detection likely fails before checking expanded verbs.

thanks for this. At the end, after playing a bit with different options,
I managed to install OpenMPI 3.1.0 OK in our cluster using UCX (I wanted
4.1.1, but that would not compile cleanly with the old version of UCX
that is installed in the cluster). The configure command line (as
reported by ompi_info) was:

,
|   Configure command line: 
'--prefix=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/openmpi-3.1.0-g5a7szwxcsgmyibqvwwavfkz5b4i2ym7'
|   '--enable-shared' '--disable-silent-rules'
|   '--disable-builtin-atomics' '--with-pmi=/usr'
|   
'--with-zlib=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/zlib-1.2.11-hrstx5ffrg4f4k3xc2anyxed3mmgdcoz'
|   '--without-knem' '--with-hcoll=/opt/mellanox/hcoll'
|   '--without-psm' '--without-ofi' '--without-cma'
|   '--with-ucx=/opt/ucx' '--without-fca'
|   '--without-mxm' '--without-verbs' '--without-xpmem'
|   '--without-psm2' '--without-alps' '--without-lsf'
|   '--without-sge' '--with-slurm' '--without-tm'
|   '--without-loadleveler' '--disable-memchecker'
|   
'--with-hwloc=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/hwloc-1.11.13-kpjkidab37wn25h2oyh3eva43ycjb6c5'
|   '--disable-java' '--disable-mpi-java'
|   '--without-cuda' '--enable-wrapper-rpath'
|   '--disable-wrapper-runpath' '--disable-mpi-cxx'
|   '--disable-cxx-exceptions'
|   
'--with-wrapper-ldflags=-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib/gcc/x86_\
| 64-pc-linux-gnu/9.3.0
|   
-Wl,-rpath,/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-7.2.0/gcc-9.3.0-ghr2jekwusoa4zip36xsa3okgp3bylqm/lib64'
`


The versions that I'm using are:

gcc:   9.3.0
mxm:   3.6.3102  (though I configure OpenMPI --without-mxm)
hcoll: 3.8.1649
knem:  1.1.2.90mlnx2 (though I configure OpenMPI --without-knem)
ucx:   1.2.2947
slurm: 18.08.7


It looks like everything executes fine, but I have a couple of warnings,
and I'm not sure how much I should worry and what I could do about them:

1) Conflicting CPU frequencies detected:

[1645221586.038838] [s01r3b78:11041:0] sys.c:744  MXM  WARN  
Conflicting CPU frequencies detected, using: 3151.41
[1645221585.740595] [s01r3b79:11484:0] sys.c:744  MXM  WARN  
Conflicting CPU frequencies detected, using: 2998.76

2) Won't use knem. In a previous try, I was specifying --with-knem, but
I was getting this warning about not being able to open /dev/knem. I
guess our cluster is not properly configured w.r.t knem, so I built
OpenMPI again --without-knem, but I still get this message?

[1645221587.091122] [s01r3b74:9054 :0]     shm.c:65   MXM  WARN  Could not 
open the KNEM device file at /dev/knem : No such file or directory. Won't use 
knem.
[1645221587.104807] [s01r3b76:8610 :0] shm.c:65   MXM  WARN  Could not 
open the KNEM device file at /dev/knem : No such file or directory. Won't use 
knem.


Any help/pointers appreciated. Many thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-02-28 Thread Angel de Vicente via users
Hello,

"Jeff Squyres (jsquyres)"  writes:

> I'd recommend against using Open MPI v3.1.0 -- it's quite old.  If you
> have to use Open MPI v3.1.x, I'd at least suggest using v3.1.6, which
> has all the rolled-up bug fixes on the v3.1.x series.
>
> That being said, Open MPI v4.1.2 is the most current.  Open MPI v4.1.2 does
> restrict which versions of UCX it uses because there are bugs in the older
> versions of UCX.  I am not intimately familiar with UCX -- you'll need to ask
> Nvidia for support there -- but I was under the impression that it's just a
> user-level library, and you could certainly install your own copy of UCX to 
> use
> with your compilation of Open MPI.  I.e., you're not restricted to whatever 
> UCX
> is installed in the cluster system-default locations.

I did follow your advice, so I compiled my own version of UCX (1.11.2)
and OpenMPI v4.1.1, but for some reason the latency / bandwidth numbers
are really bad compared to the previous ones, so something is wrong, but
not sure how to debug it. 

> I don't know why you're getting MXM-specific error messages; those don't 
> appear
> to be coming from Open MPI (especially since you configured Open MPI with
> --without-mxm).  If you can upgrade to Open MPI v4.1.2 and the latest UCX, see
> if you are still getting those MXM error messages.

In this latest attempt, yes, the MXM error messages are still there.

Cheers,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-03-01 Thread Angel de Vicente via users
Hello,

John Hearns via users  writes:

> Stupid answer from me. If latency/bandwidth numbers are bad then check
> that you are really running over the interface that you think you
> should be. You could be falling back to running over Ethernet.

I'm quite out of my depth here, so all answers are helpful, as I might have
skipped something very obvious.

In order to try and avoid the possibility of falling back to running
over Ethernet, I submitted the job with:

mpirun -n 2 --mca btl ^tcp osu_latency

which gives me the following error:

,
| At least one pair of MPI processes are unable to reach each other for
| MPI communications.  This means that no Open MPI device has indicated
| that it can be used to communicate between these processes.  This is
| an error; Open MPI requires that all MPI processes be able to reach
| each other.  This error can sometimes be the result of forgetting to
| specify the "self" BTL.
| 
|   Process 1 ([[37380,1],1]) is on host: s01r1b20
|   Process 2 ([[37380,1],0]) is on host: s01r1b19
|   BTLs attempted: self
| 
| Your MPI job is now going to abort; sorry.
`

This is certainly not happening when I use the "native" OpenMPI,
etc. provided in the cluster. I have not knowingly specified anywhere
not to support "self", so I have no clue what might be going on, as I
assumed that "self" was always built for OpenMPI.

Any hints on what (and where) I should look for?

Many thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Trouble compiling OpenMPI with Infiniband support

2022-03-11 Thread Angel de Vicente via users
Hello,


Joshua Ladd  writes:

> These are very, very old versions of UCX and HCOLL installed in your
> environment. Also, MXM was deprecated years ago in favor of UCX. What
> version of MOFED is installed (run ofed_info -s)? What HCA generation
> is present (run ibstat).

MOFED is: MLNX_OFED_LINUX-4.1-1.0.2.0

As for the HCA generation, we don't seem to have the command ibstat
installed, any other way to get this info? But I *think* they are
ConnectX-3. 


> > Stupid answer from me. If latency/bandwidth numbers are bad then check
> > that you are really running over the interface that you think you
> > should be. You could be falling back to running over Ethernet.

apparently the problem with my first attempt was that I was installing a
very bare version of UCX. I re-did the installation with the following
configuration:

,
| 
'--prefix=/storage/projects/can30/angelv/spack/opt/spack/linux-sles12-sandybridge/gcc-9.3.0/ucx-1.11.2-67aihiwsolnad6aqt2ei6j6iaptqgecf'
| '--enable-mt' '--enable-cma' '--disable-params-check' '--with-avx'
| '--enable-optimizations' '--disable-assertions' '--disable-logging'
| '--with-pic' '--with-rc' '--with-ud' '--with-dc' '--without-mlx5-dv'
| '--with-ib-hw-tm' '--with-dm' '--with-cm' '--without-rocm'
| '--without-java' '--without-cuda' '--without-gdrcopy' '--with-knem'
| '--without-xpmem'
`


and now the numbers are very good, most of the time better than the
"native" OpenMPI provided in the cluster.


So now I wanted to try another combination, using the Intel compiler
instead of gnu one. Apparently everything was compiled OK, and when I
try to run the OSU Microbenchmaks I have no problems with the
point-to-point benchmarks, but I get Segmentation Faults:

,
| load intel/2018.2 Set Intel compilers (LICENSE NEEDED! Please, contact 
support if you have any issue with license)
| /scratch/slurm/job1182830/slurm_script: line 59: unalias: despacktivate: not 
found
| [s01r2b22:26669] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
[B/././././././.][./././././././.]
| [s01r2b23:20286] MCW rank 1 bound to socket 0[core 0[hwt 0]]: 
[B/././././././.][./././././././.]
| [s01r2b22:26681:0] Caught signal 11 (Segmentation fault)
| [s01r2b23:20292:0] Caught signal 11 (Segmentation fault)
|  backtrace 
|  2 0x001c mxm_handle_error()  
/var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641
|  3 0x0010055c mxm_error_signal_handler()  
/var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616
|  4 0x00034950 killpg()  ??:0
|  5 0x000a7d41 PMPI_Comm_rank()  ??:0
|  6 0x00402e56 main()  ??:0
|  7 0x000206e5 __libc_start_main()  ??:0
|  8 0x00402ca9 _start()  
/home/abuild/rpmbuild/BUILD/glibc-2.22/csu/../sysdeps/x86_64/start.S:118
| ===
|  backtrace 
|  2 0x001c mxm_handle_error()  
/var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:641
|  3 0x0010055c mxm_error_signal_handler()  
/var/tmp/OFED_topdir/BUILD/mxm-3.6.3102/src/mxm/util/debug/debug.c:616
|  4 0x00034950 killpg()  ??:0
|  5 0x000a7d41 PMPI_Comm_rank()  ??:0
|  6 0x00402e56 main()  ??:0
|  7 0x000206e5 __libc_start_main()  ??:0
|  8 0x00402ca9 _start()  
/home/abuild/rpmbuild/BUILD/glibc-2.22/csu/../sysdeps/x86_64/start.S:118
| ===
`


Any idea how I could try to debug/solve this?

Thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


[OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Hello,

I'm running out of ideas, and wonder if someone here could have some
tips on how to debug a segmentation fault I'm having with my
application [due to the nature of the problem I'm wondering if the
problem is with OpenMPI itself rather than my app, though at this point
I'm not leaning strongly either way].

The code is hybrid MPI+OpenMP and I compile it with gcc 10.3.0 and
OpenMPI 4.1.3.

Usually I was running the code with "mpirun -np X --bind-to none [...]"
so that the threads created by OpenMP don't get bound to a single core
and I actually get proper speedup out of OpenMP.

Now, since I introduced some changes to the code this week (though I
have read the changes carefully a number of times, and I don't see
anything suspicious), I now get a segmentation fault sometimes, but only
when I run with "--bind-to none" and only in my workstation. It is not
always with the same running configuration, but I can see some pattern,
and the problem shows up only if I run the version compiled with OpenMP
support and most of the times only when the number of rank*threads goes
above 4 or so. If I run it with "--bind-to socket" all looks good all
the time.

If I run it in another server, "--bind-to none" doesn't seem to be any
issue (I submitted the jobs many many times and not a single
segmentation fault), but in my workstation it fails almost every time if
using MPI+OpenMP with a handful of threads and with "--bind-to none". In
this other server I'm running gcc 9.3.0 and OpenMPI 4.1.3.

For example, setting OMP_NUM_THREADS to 1, I run the code like the
following, and get the segmentation fault as below:

,
| angelv@sieladon:~/.../Fe13_NL3/t~gauss+isat+istim$ mpirun -np 4 --bind-to 
none  ../../../../../pcorona+openmp~gauss Fe13_NL3.params 
|  Reading control file: Fe13_NL3.params
|   ... Control file parameters broadcasted
| 
| [...]
|  
|  Starting calculation loop on the line of sight
|  Receiving results from:2
|  Receiving results from:1
| 
| Program received signal SIGSEGV: Segmentation fault - invalid memory 
reference.
| 
| Backtrace for this error:
|  Receiving results from:3
| #0  0x7fd747e7555f in ???
| #1  0x7fd7488778e1 in ???
| #2  0x7fd7488667a4 in ???
| #3  0x7fd7486fe84c in ???
| #4  0x7fd7489aa9ce in ???
| #5  0x414959 in __pcorona_main_MOD_main_loop._omp_fn.0
| at src/pcorona_main.f90:627
| #6  0x7fd74813ec75 in ???
| #7  0x412bb0 in pcorona
| at src/pcorona.f90:49
| #8  0x40361c in main
| at src/pcorona.f90:17
| 
| [...]
| 
| --
| mpirun noticed that process rank 3 with PID 0 on node sieladon exited on 
signal 11 (Segmentation fault).
| ---
`

I cannot see inside the MPI library (I don't really know if that would
be helpful) but line 627 in pcorona_main.f90 is:

,
|  call mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror)
`

Any ideas/suggestions what could be going on or how to try an get some
more clues about the possible causes of this?

Many thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Thanks Gilles,

Gilles Gouaillardet via users  writes:

> You can first double check you
> MPI_Init_thread(..., MPI_THREAD_MULTIPLE, ...)

my code uses "mpi_thread_funneled" and OpenMPI was compiled with
MPI_THREAD_MULTIPLE support:

,
| ompi_info | grep  -i thread
|   Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, 
OMPI progress: no, ORTE progress: yes, Event lib: yes)
|FT Checkpoint support: no (checkpoint thread: no)
`

Cheers,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Hello Jeff,

"Jeff Squyres (jsquyres)"  writes:

> With THREAD_FUNNELED, it means that there can only be one thread in
> MPI at a time -- and it needs to be the same thread as the one that
> called MPI_INIT_THREAD.
>
> Is that the case in your app?


the master rank (i.e. 0) never creates threads, while other ranks go through 
the following
to communicate with it, so I check that it is indeed the master thread
communicating only: 

,
|tid = 0  
| #ifdef _OPENMP  
|tid = omp_get_thread_num()   
| #endif  
| 
|do   
|   if (tid == 0) then
|  call mpi_send(my_rank, 1, mpi_integer, master, ask_job, &  
|   mpi_comm_world, mpierror) 
|  call mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror)
| 
|  if (stat(mpi_tag) == stop_signal) then 
| call mpi_recv(b_,1,mpi_integer,master,stop_signal, &
|  mpi_comm_world,stat,mpierror)  
|  else   
| call mpi_recv(iyax,1,mpi_integer,master,give_job, & 
|  mpi_comm_world,stat,mpierror)  
|  end if 
|   end if
| 
|   !$omp barrier
| 
|   [... actual work...]
`


> Also, what is your app doing at src/pcorona_main.f90:627?

It is the mpi_probe call above.


In case it can clarify things, my app follows a master-worker paradigm,
where rank 0 hands over jobs, and all mpi ranks > 0 just do the following:

,
| !$OMP PARALLEL DEFAULT(NONE)
| do
|   !  (the code above) 
|   if (tid == 0) then receive job number | stop signal
|  
|   !$OMP DO schedule(dynamic)
|   loop_izax: do izax=sol_nz_min,sol_nz_max
| 
|  [big computing loop body]
| 
|   end do loop_izax  
|   !$OMP END DO  
| 
|   if (tid == 0) then 
|   call mpi_send(iyax,1,mpi_integer,master,results_tag, & 
|mpi_comm_world,mpierror)  
|   call mpi_send(stokes_buf_y,nz*8,mpi_double_precision, &
|master,results_tag,mpi_comm_world,mpierror)   
|   end if 
|  
|   !omp barrier   
|  
| end do   
| !$OMP END PARALLEL  
`



Following Gilles' suggestion, I also tried changing MPI_THREAD_FUNELLED
to MPI_THREAD_MULTIPLE just in case, but I get the same segmentation
fault in the same line (mind you, the segmentation fault doesn't happen
all the time). But again, no issues if running with --bind-to socket
(and no apparent issues at all in the other computer even with --bind-to
none).

Many thanks for any suggestions,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-22 Thread Angel de Vicente via users
Hello,

"Keller, Rainer"  writes:

> You’re using MPI_Probe() with Threads; that’s not safe.
> Please consider using MPI_Mprobe() together with MPI_Mrecv().

many thanks for the suggestion. I will try with the M variants, though I
was under the impression that mpi_probe() was OK as far as one made sure
that the source and tag matched between the mpi_probe() and the
mpi_recv() calls.

As you can see below, I'm careful with that (in any case I'm not sure
the problems lies there, since the error I get is about invalid
reference in the mpi_probe call itself). 

,
|tid = 0  
| #ifdef _OPENMP  
|tid = omp_get_thread_num()   
| #endif  
| 
|do   
|   if (tid == 0) then
|  call mpi_send(my_rank, 1, mpi_integer, master, ask_job, &  
|   mpi_comm_world, mpierror) 
|  call mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror)
| 
|  if (stat(mpi_tag) == stop_signal) then 
| call mpi_recv(b_,1,mpi_integer,master,stop_signal, &
|  mpi_comm_world,stat,mpierror)  
|  else   
| call mpi_recv(iyax,1,mpi_integer,master,give_job, & 
|  mpi_comm_world,stat,mpierror)  
|  end if 
|   end if
| 
|   !$omp barrier
| 
|   [... actual work...]
`


> So getting into valgrind may be of help, possibly recompiling Open MPI
> enabling valgrind-checking together with debugging options.

I was hoping to avoid this route, but it certainly is looking like I'll
have to bite the bullet...

Thanks,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


Re: [OMPI users] Help diagnosing MPI+OpenMP application segmentation fault only when run with --bind-to none

2022-04-26 Thread Angel de Vicente via users
Hello,

thanks for your help and suggestions.

At the end it was no issue with OpenMPI or with any other system stuff,
but rather a single line in our code. I thought I was doing the tests
with the -fbounds-check option, but it turns out I was not, arrrghh!! At
some point I was writing outside one of our arrays, and you can imagine
the rest... That it was happening only when I was running with
'--bind-to none' and only in my workstation brought me down all the
wrong debugging paths. Once I realized -fbounds-check was not being
used, figuring out the issue was a matter of seconds...

Our code is now happily performing the +3000 tests without a hitch.

Cheers,
-- 
Ángel de Vicente

Tel.: +34 922 605 747
Web.: http://research.iac.es/proyecto/polmag/
-
AVISO LEGAL: Este mensaje puede contener información confidencial y/o 
privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido 
por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no 
autorizadas del contenido de este mensaje está estrictamente prohibida. Más 
información en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged 
information. If you are not the final recipient or have received it in error, 
please notify the sender immediately. Any unauthorized use of the content of 
this message is strictly prohibited. More information:  
https://www.iac.es/en/disclaimer


[OMPI users] Location of the file pmix-mca-params.conf?

2023-06-14 Thread Angel de Vicente via users
Hello,

with our current setting of OpenMPI and Slurm in a Ubuntu 22.04 server,
when we submit MPI jobs I get the message:

PMIX ERROR: ERROR in file
../../../../../../src/mca/gds/ds12/gds_ds12_lock_pthread.c at line 169

Following https://github.com/open-mpi/ompi/issues/7516, I tried setting
PMIX_MCA_gds=hash.

Setting it as an environment variable or setting it in the file
~/.pmix/mca-params.conf does work. I now want to set it system-wide, but
I cannot find which file that should be.

I have tried: 
+ /etc/pmix-mca-params.conf
+ /usr/lib/x86_64-linux-gnu/pmix2/etc/pmix-mca.params.conf
but no luck.

Do you know which file is the system-wide configuration or how to find
it?

Thanks,
-- 
Ángel de Vicente -- (GPG: 0x64D9FDAE7CD5E939)
 Research Software Engineer (Supercomputing and BigData)
 Instituto de Astrofísica de Canarias (https://www.iac.es/en)



Re: [OMPI users] Location of the file pmix-mca-params.conf?

2023-06-14 Thread Angel de Vicente via users
Hi,

Angel de Vicente via users  writes:

> I have tried: 
> + /etc/pmix-mca-params.conf
> + /usr/lib/x86_64-linux-gnu/pmix2/etc/pmix-mca.params.conf
> but no luck.

Never mind, /etc/openmpi/pmix-mca-params.conf was the right one.

Cheers,
-- 
Ángel de Vicente -- (GPG: 0x64D9FDAE7CD5E939)
 Research Software Engineer (Supercomputing and BigData)
 Instituto de Astrofísica de Canarias (https://www.iac.es/en)