On Jul 25, 2007, at 7:45 AM, Biagio Cosenza wrote:

Jeff, I did what you suggested

However no noticeable changes seem to happen. Same peaks and same latency times.

Ok. This suggests that Nagle may not be the issue here. Is the code tightly coupled? If so, this could be normal operating system "jitter" -- one MPI process was swapped out to run some system daemon and therefore other MPI processes saw a blocking effect until the peer returned, causing performance ripples.

Are you sure that for disabling the Nagle's algorithm is needed just changing optval to 0?
I saw that, in btl_tcp_endpoint.c, the optval assignement is inside a
#if defined(TCP_NODELAY) block.

Where does this macro can be defined?

It's usually within system header files. A trivial check can be used to figure out if your system is compiling this block: put a syntax error within the #if block and then rebuild the TCP component. If the compile fails due to the syntax error, then you know that that block is being compiled.

Any other idea for manage latency peaks?

Biagio


On 7/24/07, Jeff Squyres <jsquy...@cisco.com> wrote: On Jul 23, 2007, at 6:43 AM, Biagio Cosenza wrote:

> I'm working on a parallel real time renderer: an embarassing
> parallel problem where latency is the threshold to high perfomance.
>
> Two observations:
>
> 1) I did a simple "ping-pong" test (the master does a Bcast + an
> IRecv for each node + a Waitall) similar to effective renderer
> workload. Using a cluster of 37 nodes on Gigabit Ethernet, seems
> that the latency is usually low (about 1-5 ms), but sometimes there
> are some peaks of about 200 ms. I thought that the cause is a
> packet retransmission in one of the 37 connections, that blow the
> overall performance of the test (of course, the final WaitAll is a
> synch).
>
> 2) A research team argues in a paper  that MPI suffers on
> dynamically manage latency. They also arguing an interesting
> problem about enable/disable Nagle algorithm. (I paste the
> interesting paragraph below)
>
>
> So I have two questions:
>
> 1) Why my test have these peaks? How can I afford them (I think to
> btl tcp params)?

They are probably beyond Open MPI's control -- OMPI mainly does read
() and write() down TCP sockets and relies on the kernel to do all
the low-level TCP protocol / wire transmission stuff.

You might want to try increasing your TCP buffer sizes, but I think
that the Linux kernel has some built in limits.  Other experts might
want to chime in here...

> 2) When does OpenMPI disable Nagle algorithm? Suppose I DON'T need
> that Nagle has to be ON (focusing only on latency), how can I
> increase performance?

It looks like we enable Nagle right when TCP BTL connections are
made.  Surprisingly, it looks like we don't have a run-time option to
turn it off for power-users like you who want to really tweak around.

If you want to play with it, please edit ompi/mca/btl/tcp/
btl_tcp_endpoint.c.  You'll see the references to TCP_NODELAY in
conjunction with setsockopt().  Set the optval to 0 instead of 1.  A
simple "make install" in that directory will recompile the TCP
component and re-install it (assuming you have done a default build
with OMPI components built as standalone plugins).  Let us know what
you find.

--
Jeff Squyres
Cisco Systems

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to