Re: [OMPI users] error from MPI_Allgather

2012-10-30 Thread rajesh



Jeff Squyres  cisco.com> writes:

> 
> Two things:
> 
> 1. That looks like an MPICH error message (i.e., it's not from Open MPI --
Open MPI and MPICH2 are entirely
> different software packages with different developers and behaviors).  You
might want to contact them
> for more specific details.
> 
> 2. That being said, it looks like you used the same buffer for both the sbuf
and rbuf.  MPI does not allow you to
> do that; you need to specify different buffers for those arguments.
> > 

Hi Jeff,
Thank you for your reply.

The problem occurs with openmpi. I could understand the problem as you said in
the reply. But how can I set different buffers for them?
thank you

Rajesh






Re: [OMPI users] OpenMPI on Windows when MPI_F77 is used from a C application

2012-10-30 Thread Mathieu Gontier
Hi Damien,

The only message I have is:
[vs2010:09300] [[56007,0],0]-[[56007,1],0] mca_oob_tcp_msg_recv: readv
failed: Unknown error (108)
[vs2010:09300] 2 more processes have sent help message
help-odls-default.txt / odls-default:could-not-kill

Does it mean something for you?



On Mon, Oct 29, 2012 at 9:35 PM, Damien  wrote:

>  Is there a series of error messages or anything at all that you can post
> here?
>
> Damien
>
>
> On 29/10/2012 2:30 PM, Mathieu Gontier wrote:
>
> Hi guys.
>
>  Finally, I compiled with /O: the options is deprecated and, like I did
> previously, I used /Od instead... unsuccessfully.
>
>  I also compiled my code from a script in order to call mpicc.exe /
> mpiCC.exe / mpif77.exe instead of directly calling cl.exe and ifort.exe.
> Only the linkage is done without mpicc.exe because I did not found how to
> call the linker from mpicc.exe (if it can change something, just let me
> know). So, the purpose is to compile with the default OpenMPI options (if
> there is any). But my solver still crashes.
>
>  So, if anybody has an idea...
>
>  Thanks for your help.
>
> On Mon, Oct 29, 2012 at 7:33 PM, Mathieu Gontier <
> mathieu.gont...@gmail.com> wrote:
>
>> I crashes into the fortran routine calling a MPI functions. When I run
>> the debugger, the crash seems to be in libmpi_f77.lib, but I cannot go
>> further since the lib is not in debbug mode.
>>
>>  Attached to this email the files of my small case. But with
>> less aggressive options, it works.
>>
>>  I did not know the lowst optimization level is /O: I am going to try.
>>
>>
>> On Mon, Oct 29, 2012 at 5:08 PM, Damien  wrote:
>>
>>>  Mathieu,
>>>
>>> Where is the crash?  Without that info, I'd suggest turning off all the
>>> optimisations and just compile it without any flags other than what you
>>> need to compile it cleanly (so no /O flags) and see if it crashes.
>>>
>>> Damien
>>>
>>>
>>> On 26/10/2012 10:27 AM, Mathieu Gontier wrote:
>>>
>>>  Dear all,
>>>
>>>  I am willing to use OpenMPI on Windows for a CFD instead of  MPICH2.
>>> My solver is developed if Fortran77 and piloted by a C++ interface; the
>>> both levels call MPI functions.
>>>
>>>  So, I installed OpenMPI-1.6.2-x64 on my system and compiled my code
>>> successfully. But, at the runtime it crashed.
>>> I reproduced the problem into a small C application calling a Fortran
>>> function using MPI_Allreduce; when I removed some aggressive optimization
>>> options from the Fortran, it worked:
>>> *
>>>
>>>-
>>>
>>>Optimization: Disable (/Od)
>>> -
>>>
>>>Inline Function Expansion: Any Suitable (/Ob2)
>>> -
>>>
>>>Favor Size or Speed: Favor Fast Code (/Ot)
>>>
>>> *
>>>
>>>  So, I removed the same options from the Fortran parts of my solver,
>>> but it still crashes. I tried some others, but it still continues
>>> crashing. Does anybody has an idea? Should I (de)activate some compilation
>>> options? Is there some properties to build and link against libmpi_f77.lib?
>>>
>>>  Thanks for your help.
>>> Mathieu.
>>>
>>>  --
>>> Mathieu Gontier
>>> - MSN: mathieu.gont...@gmail.com
>>> - Skype: mathieu_gontier
>>>
>>>
>>>  ___
>>> users mailing 
>>> listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>>  --
>> Mathieu Gontier
>> - MSN: mathieu.gont...@gmail.com
>> - Skype: mathieu_gontier
>>
>
>
>
>  --
> Mathieu Gontier
> - MSN: mathieu.gont...@gmail.com
> - Skype: mathieu_gontier
>
>
> ___
> users mailing 
> listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Mathieu Gontier
- MSN: mathieu.gont...@gmail.com
- Skype: mathieu_gontier


Re: [OMPI users] Performance/stability impact of thread support

2012-10-30 Thread Paul Kapinos

At least, be aware of silently disabling the usage of InfiniBand if 'multiple'
threading level is activated:

http://www.open-mpi.org/community/lists/devel/2012/10/11584.php




On 10/29/12 19:14, Daniel Mitchell wrote:

Hi everyone,

I've asked my linux distribution to repackage Open MPI with thread support 
(meaning configure with --enable-thread-multiple). They are willing to do this 
if it won't have any performance/stability hit for Open MPI users who don't 
need thread support (meaning everyone but me, apparently). Does enabling thread 
support impact performance/stability?

Daniel
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] OpenMPI on Windows when MPI_F77 is used from a C application

2012-10-30 Thread Damien Hocking

I've never seen that, but someone else might have.

Damien

On 30/10/2012 1:43 AM, Mathieu Gontier wrote:

Hi Damien,

The only message I have is:
[vs2010:09300] [[56007,0],0]-[[56007,1],0] mca_oob_tcp_msg_recv: readv 
failed: Unknown error (108)
[vs2010:09300] 2 more processes have sent help message 
help-odls-default.txt / odls-default:could-not-kill


Does it mean something for you?





Re: [OMPI users] Performance/stability impact of thread support

2012-10-30 Thread Jeff Squyres
Short answer: yes, enabling threading impacts performance, to include silently 
disabling OpenFabrics support.

On Oct 30, 2012, at 6:03 AM, Paul Kapinos wrote:

> At least, be aware of silently disabling the usage of InfiniBand if 'multiple'
> threading level is activated:
> 
> http://www.open-mpi.org/community/lists/devel/2012/10/11584.php
> 
> 
> 
> 
> On 10/29/12 19:14, Daniel Mitchell wrote:
>> Hi everyone,
>> 
>> I've asked my linux distribution to repackage Open MPI with thread support 
>> (meaning configure with --enable-thread-multiple). They are willing to do 
>> this if it won't have any performance/stability hit for Open MPI users who 
>> don't need thread support (meaning everyone but me, apparently). Does 
>> enabling thread support impact performance/stability?
>> 
>> Daniel
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> 
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] OpenMPI on Windows when MPI_F77 is used from a C application

2012-10-30 Thread Jeff Squyres
What's errno=108 on your platform?

On Oct 30, 2012, at 9:22 AM, Damien Hocking wrote:

> I've never seen that, but someone else might have.
> 
> Damien
> 
> On 30/10/2012 1:43 AM, Mathieu Gontier wrote:
>> Hi Damien,
>> 
>> The only message I have is:
>> [vs2010:09300] [[56007,0],0]-[[56007,1],0] mca_oob_tcp_msg_recv: readv 
>> failed: Unknown error (108)
>> [vs2010:09300] 2 more processes have sent help message help-odls-default.txt 
>> / odls-default:could-not-kill
>> 
>> Does it mean something for you?
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] EXTERNAL: Re: openmpi shared memory feature

2012-10-30 Thread Hodge, Gary C
FYI, recently, I was tracking down the source of page faults in our application 
that has real-time requirements.  I found that disabling the sm component 
(--mca btl ^sm) eliminated many page faults I was seeing.  I now have much 
better deterministic performance in that I no longer see outlier measurements 
(jobs that usually take 3 ms would sometimes take 15 ms).  I did not notice a 
performance penalty using a network stack.

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Mahmood Naderan
Sent: Saturday, October 27, 2012 12:47 PM
To: Jeff Squyres
Cc: us...@open-mpi.org
Subject: EXTERNAL: Re: [OMPI users] openmpi shared memory feature


>Because communicating through shared memory when sending messages between 
>processes on the same server is far faster than going through a network stack.

I see... But that is not good for diskless clusters. Am I right? assume 
processes are on a node (which has no disk). In this case, their communication 
go though network (from computing node to server) then IO and then network 
again (from server to computing node).
Regards,
Mahmood


From: Jeff Squyres mailto:jsquy...@cisco.com>>
To: Mahmood Naderan mailto:nt_mahm...@yahoo.com>>; Open 
MPI Users mailto:us...@open-mpi.org>>
Sent: Saturday, October 27, 2012 6:19 PM
Subject: Re: [OMPI users] openmpi shared memory feature

On Oct 27, 2012, at 10:49 AM, Mahmood Naderan wrote:

> Why openmpi uses shared memory model?

Because communicating through shared memory when sending messages between 
processes on the same server is far faster than going through a network stack.

> this can be disabled though by setting "--mca ^sm".
> It seems that by default openmpi uses such feature (shared memory backing 
> files) which is strange.
>
> Regards,
> Mahmood
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] EXTERNAL: Re: openmpi shared memory feature

2012-10-30 Thread Jeff Squyres
On Oct 30, 2012, at 9:51 AM, Hodge, Gary C wrote:

> FYI, recently, I was tracking down the source of page faults in our 
> application that has real-time requirements.  I found that disabling the sm 
> component (--mca btl ^sm) eliminated many page faults I was seeing.  

Good point.  This is likely true; the shared memory component will definitely 
cause more page faults.  Using huge pages may alleviate this (e.g., less TLB 
usage), but we haven't studied it much.

> I now have much better deterministic performance in that I no longer see 
> outlier measurements (jobs that usually take 3 ms would sometimes take 15 
> ms).  

I'm not sure I grok that; are you benchmarking an entire *job* (i.e., a single 
"mpirun") that varies between 3 and 15 milliseconds?  If so, I'd say that both 
are pretty darn good, because mpirun invokes a lot of overhead for launching 
and completing jobs.  Furthermore, benchmarking an entire job that lasts 
significantly less than 1 second is probably not the most stable measurement, 
regardless of page faults or not -- there's lots of other distributed and OS 
effects that can cause a jump from 3 to 15 milliseconds. 

> I did not notice a performance penalty using a network stack.

Depends on the app.  Some MPI apps are latency bound; some are not.

Latency-bound applications will definitely benefit from faster point-to-point 
performance.  Shared memory will definitely have the fastest point-to-point 
latency compared to any network stack (i.e., hundreds of nanos vs. 1+ micro).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] error from MPI_Allgather

2012-10-30 Thread Jeff Squyres
On Oct 30, 2012, at 2:23 AM, rajesh wrote:

>> 2. That being said, it looks like you used the same buffer for both the sbuf 
>> and rbuf.  MPI does not allow you to
>> do that; you need to specify different buffers for those arguments.
> 
> The problem occurs with openmpi. I could understand the problem as you said in
> the reply. But how can I set different buffers for them?

You need to change your code so that you can pass different buffers in for sbuf 
and rbuf. 

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] EXTERNAL: Re: openmpi shared memory feature

2012-10-30 Thread Hodge, Gary C
Our measurements are not for the entire mpirun job, rather they are for the 
time it takes to process a message through our processing pipeline consisting 
of 11 processes distributed over 8 nodes.  Taking an extra microsecond here and 
there is better for us than jumping from 3 to 15 ms because this is when we 
cross a hard real-time limit

-Original Message-
From: Jeff Squyres [mailto:jsquy...@cisco.com] 
Sent: Tuesday, October 30, 2012 9:57 AM
To: Hodge, Gary C
Cc: Mahmood Naderan; Open MPI Users
Subject: Re: EXTERNAL: Re: [OMPI users] openmpi shared memory feature

On Oct 30, 2012, at 9:51 AM, Hodge, Gary C wrote:

> FYI, recently, I was tracking down the source of page faults in our 
> application that has real-time requirements.  I found that disabling the sm 
> component (--mca btl ^sm) eliminated many page faults I was seeing.  

Good point.  This is likely true; the shared memory component will definitely 
cause more page faults.  Using huge pages may alleviate this (e.g., less TLB 
usage), but we haven't studied it much.

> I now have much better deterministic performance in that I no longer see 
> outlier measurements (jobs that usually take 3 ms would sometimes take 15 
> ms).  

I'm not sure I grok that; are you benchmarking an entire *job* (i.e., a single 
"mpirun") that varies between 3 and 15 milliseconds?  If so, I'd say that both 
are pretty darn good, because mpirun invokes a lot of overhead for launching 
and completing jobs.  Furthermore, benchmarking an entire job that lasts 
significantly less than 1 second is probably not the most stable measurement, 
regardless of page faults or not -- there's lots of other distributed and OS 
effects that can cause a jump from 3 to 15 milliseconds. 

> I did not notice a performance penalty using a network stack.

Depends on the app.  Some MPI apps are latency bound; some are not.

Latency-bound applications will definitely benefit from faster point-to-point 
performance.  Shared memory will definitely have the fastest point-to-point 
latency compared to any network stack (i.e., hundreds of nanos vs. 1+ micro).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/