Hi Graham,
thanks for fixing it so fast! I have attached a 128 CPU (=32 nodes*4
CPUs) slog file that tests the OpenMPI tuned all-to-all for a message
size of 4096 floats (16384 bytes) where the execution times vary
between 0.12 and 0.43 seconds.
Summary (25-run average, timer resolution 0.01)
Hi Carsten,
ops, sorry!. There was a memory bug created by me misusing one my own
collective topo functions.. which I think was corrupting the MPE logging
buffers (and who knows what else). Anyway it should be fixed in the next
nightly build/tarball.
G
On Fri, 6 Jan 2006, Carsten Kutzner wro
On Fri, 6 Jan 2006, Graham E Fagg wrote:
> > Looks like the problem is somewhere in the tuned collectives?
> > Unfortunately I need a logfile with exactly those :(
> >
> > Carsten
>
> I hope not. Carsten can you send me your configure line (not the whole
> log) and any other things your set in y
On Jan 6, 2006, at 8:13 AM, Carsten Kutzner wrote:
Looks like the problem is somewhere in the tuned collectives?
Unfortunately I need a logfile with exactly those :(
FWIW, we just activated these tuned collectives on the trunk (which
will eventually become the 1.1.x series; the tuned collect
Looks like the problem is somewhere in the tuned collectives?
Unfortunately I need a logfile with exactly those :(
Carsten
I hope not. Carsten can you send me your configure line (not the whole
log) and any other things your set in your .mca conf file. Is this with
the changed (custom) deci
On Wed, 4 Jan 2006, Jeff Squyres wrote:
> On Jan 4, 2006, at 2:08 PM, Anthony Chan wrote:
>
> >> Either my program quits without writing the logfile (and without
> >> complaining) or it crashes in MPI_Finalize. I get the message
> >> "33 additional processes aborted (not shown)".
> >
> > This is n
On Jan 4, 2006, at 2:08 PM, Anthony Chan wrote:
Either my program quits without writing the logfile (and without
complaining) or it crashes in MPI_Finalize. I get the message
"33 additional processes aborted (not shown)".
This is not MPE error message. If the logging crashes in
MPI_Finalize
On Wed, 4 Jan 2006, Carsten Kutzner wrote:
> On Tue, 3 Jan 2006, Anthony Chan wrote:
>
> > MPE/MPE2 logging (or clog/clog2) does not impose any limitation on the
> > number of processes. Could you explain what difficulty or error
> > message you encountered when using >32 processes ?
>
> Either
Thanks Carsten,
I have started updating my jumpshot so will let you know as soon as I
have some ideas on whats going on.
G.
ps. I am going offline now for 2 days while travelling
On Wed, 4 Jan 2006, Carsten Kutzner wrote:
Hi Graham,
here are the all-to-all test results with the modification
Hi Graham,
here are the all-to-all test results with the modification to the decision
routine you suggested yesterday. Now the routine behaves nicely for 128
and 256 float messages on 128 CPUs! For the other sizes one probably wants
to keep the original algorithm, since it is faster there. However
On Tue, 3 Jan 2006, Anthony Chan wrote:
> MPE/MPE2 logging (or clog/clog2) does not impose any limitation on the
> number of processes. Could you explain what difficulty or error
> message you encountered when using >32 processes ?
Either my program quits without writing the logfile (and without
Hi Peter,
We have HP ProCurve 2848 GigE switches here (48 port). The problem is more
severe the more nodes (=ports) are involved. It starts to show up at 16
ports for a limited range of message sizes and gets really bad for 32
nodes. The switch has a 96 Gbit/s backplane and should therefore be
abl
Hello Carsten,
Have you considered the possibility that this is the effect of a non-optimal
ethernet switch? I don't know how many nodes you need to reproduce it on or
if you even have physical access (and opportunity) but popping in another
decent 16-port switch for a testrun might be interest
On Tue, 3 Jan 2006, Carsten Kutzner wrote:
> On Tue, 3 Jan 2006, Graham E Fagg wrote:
>
> > Do you have any tools such as Vampir (or its Intel equivalent) available
> > to get a time line graph ? (even jumpshot of one of the bad cases such as
> > the 128/32 for 256 floats below would help).
>
> H
On Tue, 3 Jan 2006, Graham E Fagg wrote:
> Do you have any tools such as Vampir (or its Intel equivalent) available
> to get a time line graph ? (even jumpshot of one of the bad cases such as
> the 128/32 for 256 floats below would help).
Hi Graham,
I have attached an slog file of an all-to-all
Hello Carsten
happy new year to you too.
On Tue, 3 Jan 2006, Carsten Kutzner wrote:
Hi Graham,
sorry for the long delay, I was on Christmas holidays. I wish a Happy New
Year!
(Uh, I think the previous email did not arrive in my postbox (?)) But yes,
I am resending it after this reply
Hi Graham,
sorry for the long delay, I was on Christmas holidays. I wish a Happy New
Year!
On Fri, 23 Dec 2005, Graham E Fagg wrote:
>
> > I have also tried the tuned alltoalls and they are really great!! Only for
> > very few message sizes in the case of 4 CPUs on a node one of my alltoalls
> >
Hi Carsten
I have also tried the tuned alltoalls and they are really great!! Only for
very few message sizes in the case of 4 CPUs on a node one of my alltoalls
performed better. Are these tuned collectives ready to be used for
production runs?
We are actively testing them on larger systems t
On Tue, 20 Dec 2005, George Bosilca wrote:
> On Dec 20, 2005, at 3:19 AM, Carsten Kutzner wrote:
>
> >> I don't see how you deduct that adding barriers increase the
> >> congestion ? It increase the latency for the all-to-all but for me
> >
> > When I do an all-to-all a lot of times, I see that th
On Dec 20, 2005, at 3:19 AM, Carsten Kutzner wrote:
I don't see how you deduct that adding barriers increase the
congestion ? It increase the latency for the all-to-all but for me
When I do an all-to-all a lot of times, I see that the time for a
single
all-to-all varies a lot. My time meas
On Mon, 19 Dec 2005, George Bosilca wrote:
> Carsten,
>
> In the Open MPI source code directory there is a collective component
> called tuned (ompi/mca/coll/tuned). This component is not enabled by
> default right now, but usually it give better performances than the
> basic one. You should give
Carsten,
In the Open MPI source code directory there is a collective component
called tuned (ompi/mca/coll/tuned). This component is not enabled by
default right now, but usually it give better performances than the
basic one. You should give it a try (go inside and remove
the .ompi_ignor
Hello,
I am desparately trying to get better all-to-all performance on Gbit
Ethernet (flow control is enabled). I have been playing around with
several all-to-all schemes and been able to reduce congestion by
communicating in an ordered fashion.
E.g. the simplest scheme looks like
for (i=0; i
23 matches
Mail list logo